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In most sciences one generation tears down 
what another has built and what one has 
established another undoes. In Mathematics 
alone each generation builds a new story to 
the old structure. 


— Hermann Hankel 

A peculiar beauty reigns in the realm of 
mathematics, a beauty which resembles not 
so much the beauty of art as the beauty of 
nature and which affects the reflective mind, 
which has acquired an appreciation of it, 
very much like the latter. 


Ernst Kummer 
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Preface 


It is often averred that two contrasting cultures coexist in mathematics — the theory- 
building culture and the problem-solving culture. The present volume was certainly 
spawned by the latter. This book takes an array of specific problems and solves 
them, with the needed tools developed along the way in the context of the particular 
problems. 

The book is an unusual hybrid. It treats a melange of topics from combinatorial 
probability theory, multiplicative number theory, random graph theory, and combi- 
natorics. Objectively, what the problems in this book have in common is that they 
involve the asymptotic analysis of a discrete construct, as some natural parameter 
of the system tends to infinity. Subjectively, what these problems have in common 
is that both their statements and their solutions resonate aesthetically with me. 

The results in this book lend themselves to the title “Problems from the Finite to 
the Infinite”; however, with regard to the methods of proof, the chosen appellation 
is the more apt. In particular, generating functions in their various guises are 
a fundamental bridge “from the discrete to the continuous,” as the book’s title 
would have it; such functions work their magic often in these pages. Besides 
bridging discrete mathematics and mathematical analysis, the book makes a modest 
attempt at bridging disciplines — probability, number theory, graph theory, and 
combinatorics. 

In addition to the considerations mentioned above, the problems were selected 
with an eye toward accessibility to a wide audience, including advanced undergrad- 
uate students. The technical prerequisites for the book are a good grounding in basic 
undergraduate analysis, a touch of familiarity with combinatorics, and a little basic 
probability theory. One appendix provides the necessary probabilistic background, 
and another appendix provides a warm-up for dealing with generating functions. 
That said, a moderate dose of the elusive quality known as mathematical maturity 
will certainly be helpful throughout the text and will be necessary on occasion. 

The primary intent of the book is to introduce a number of beautiful problems in 
a variety of subjects quickly, pithily, and completely rigorously to graduate students 
and advanced undergraduates. The book could be used for a seminar/capstone 
course in which students present the lectures. It is hoped that the book might also be 


IX 
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Preface 


of interest to mathematicians whose fields of expertise are away from the subjects 
treated herein. In light of the primary intended audience, the level of detail in proofs 
is a bit greater than what one sometimes finds in graduate mathematics texts. 

I conclude with some brief comments on the novelty or lack thereof in the 
various chapters. A bit more information in this vein may be found in the chapter 
notes at the end of each chapter. Chapter 1 follows a standard approach to the 
problem it solves. The same is true for Chap. 2 except for the probabilistic proof 
of Theorem 2.1, which I haven’t seen in the literature. The packing problem 
result in Chap. 3 seems to be new, and the proof almost certainly is. My approach 
to the arcsine laws in Chap. 4 is somewhat different than the standard one; it 
exploits generating functions to the hilt and is almost completely combinatorial. 
The traditional method of proof is considerably more probabilistic. The proofs of 
the results in Chap. 5 on the distribution of cycles in random permutations are 
almost exclusively combinatorial, through the method of generating functions. In 
particular, the proof of Theorem 5.2 makes quite sophisticated use of this technique. 
In the setting of weighted permutations, it seems that the method of proof offered 
here cannot be found elsewhere. The number theoretic topics in Chaps. 6-8 are 
developed in a standard fashion, although the route has been streamlined a bit to 
provide a rapid approach to the primary goal, namely, the proof of the Hardy- 
Ramanujan theorem. In Chap. 9, the proof concerning the number of cliques in a 
random graph is more or less standard. The result on tampering detection constitutes 
material with a new twist and the methods are rather probabilistic; a little additional 
probabilistic background and sophistication on the part of the reader would be useful 
here. The results from Ramsey theory are presented in a standard way. Chapter 10, 
which deals with the phase transition concerning the giant component in a sparse 
random graph, is the most demanding technically. The reader with a modicum of 
probabilistic sophistication will be at quite an advantage here. It appears to me that 
a complete proof of the main results in this chapter, with all the details, is not to be 
found in the literature. 

Acknowledgements It is a pleasure to thank my editor, Donna Chernyk, for her professionalism 
and superb diligence. 


Haifa, Israel 
April 2014 
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A Note on Notation 


Z denotes the set of integers 

Z + denotes the set of nonnegative integers 

N denotes the set of natural numbers: {1 , 2, • • • } 

R denotes the set of real numbers 

fix') — 0(g(x)) as x — > a means that lim sup A ^ tt |^y| < oo; in particular, 
fix) = 0(1) as x — > a means that / (x) remains bounded as x — > a 

f (x) = 6>(g(x)) as x — > a means that lim — 0; in particular, /(x) = o( 1) 
as x — > a means lim x ^ a /(x) = 0 

/ ~ g as x — > a means that lim x ^ a — 1 

gcd(xi, . . . , x m ) denotes the greatest common divisor of the positive integers 

x i , . . . , x m 

The symbol [•] is used in two contexts: 

1. [n\ — {1, 2, . . . , n}, for n e N 

2. [x] is the greatest integer function; that is, [x] = n, if n E Z and n < x < n + 1 

Bin (n , p) is the binomial distribution with parameters n and p 
Pois(A) is the Poisson distribution with parameter A 
Ber(/?) is the Bernoulli distribution with parameter p 

X ~ Bin (n,p) means the random variable X is distributed according to the 
distribution Bin (n , p) 


xm 



Chapter 1 

Partitions with Restricted Summands 
or “the Money Changing Problem” 


Imagine a country with coins of denominations 5 cents, 13 cents, and 27 cents. How 
many ways can you make change for $51,419.48? That is, how many solutions 
(b\,b 2 ,bi) are there to the equation 5b\ + 13^2 + 27^3 = 5,141,948, with 
the restriction that ^3 be nonnegative integers? This is a specific case of 

the following general problem. Fix m distinct, positive integers {a j }7= i • Count the 
number of solutions (b \ , . . . , b m ) with integral entries to the equation 

b\a\ + Z? 2^2 + * • * + b m a m — n, bj > 0, j = 1, . . . , m. (1.1) 

A partition of n is a sequence of integers (x\, . . . , Xk), where k is a positive 
integer, such that 


k 

Y J Xi — n and x\ > X 2 > • • • > Xk > 1. 

i = 1 


Let P n denote the number of different partitions of n . The problem of obtaining an 
asymptotic formula for P n is celebrated and very difficult. It was solved in 1918 by 
G.H. Hardy and S. Ramanujan, who proved that 




oo. 


Now consider partitions of n where we restrict the values of the summands x, above 
to the set {a j }J =1 . Denote the number of such restricted partitions by P n ({ aj 
A moment’s thought reveals that the number of solutions to (1.1) is P n ({aj }J =1 ). 

Does there exist a solution to (1.1) for every sufficiently large integer nl And 
if so, can one evaluate asymptotically the number of such solutions for large nl 
Without posing any restrictions on {aj }J = j , the answer to the first question is 
negative. For example, if m = 3 and a\ — 5,^2 = 10,(23 — 30, then clearly 
there is no solution to (1.1) if n \ 5. Indeed, it is clear that a necessary condition 
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1 Partitions with Restricted Summands 


for the existence of a solution for all large n is that {a j } ! j = { are relatively prime: 
gcd(a\ , . . . , a m ) — 1. This is the time to recall a well-known result concerning 
solutions (b\, . . . ,b m ) with (not necessarily nonnegative) integral entries to the 

equation b\a\ + ^ 2^2 H h b m a m — n. A fundamental theorem in algebra/number 

theory states that there exists an integral solution to this equation for all n E 7L if 
and only if gcd(a\ , . . . , a m ) = 1. This result has an elegant group theoretical proof. 
We will prove that for all large n , (1.1) has a solution (b\, . . . , b m ) with integral 
entries if and only if gcd(a\ , . . . , a m ) = 1 , and we will give a precise asymptotic 
estimate for the number of such solutions for large n . 

Theorem 1.1. Let m > 2 and let {aj }'J = j distinct, positive integers. Assume 
that the greatest common divisor of {a j } 7 j ? =1 is 1: gcd(a \ , . . . , a m ) — 1. Then for all 
sufficiently large n, there exists at least one integral solution to (1.1). Furthermore, 
the number P n ({a j }'J = j ) of such solutions satisfies 


Pn({aj} 7 =1 ) 


ft 


m— 1 


(m - 1)! n;=i « 


fts ft 


oo. 


( 1 . 2 ) 


Remark. In particular, we see (not surprisingly) that for fixed m and sufficiently 
large ft, the smaller the are, the more solutions there are. We also see 

that given m\ and and given m 2 and {fty 2 ) }J=i, with m 2 > mu then 

for sufficiently large n there will be more solutions for the latter set of parameters. 

Proof. We will prove the asymptotic estimate in (1 .2), from which the first statement 
of the theorem will also follow. Let h n denote the number of solutions to (1.1). (For 
the proof, the notation h n will be a lot more convenient than P n {{ajYj = f).) Thus, 
we need to show that (1.2) holds with h n in place of P n ({aj j ) . We define the 
generating function of {h n }ff { : 


00 

H(x) — y j h n x n . 

n= 1 


(1.3) 


YYI 

A simple, rough estimate shows that h n < — 7 , from which it follows that the 

1 ij— 1 a j 

power series on the right hand side of (1.3) converges for \x\ < 1. See Exercise 1.1. 
It turns out that we can exhibit H explicitly. We demonstrate this for the case m — 2, 
from which the general case will become clear. 

For k — 1 , 2, we have 

— 1 — = 1 + X ak + x 2ctk + x 3ak H , 

1 — x ak 


and the series converges absolutely for \x 


< 1. Thus, 
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1 


(1— x ai )(l—x a2 ) 


= ( 1 +x ai +x 2a ' +x 3a, + ---)(l+x a2 +x 2a2 +x 3a2 +•••) = 


(l + x a ' + x 2ai + x 3a> + • • • ) + (x ai + X a ' +ai + x 2a ' +ai _|_ x 'ia\+a 2 _|_ 


( 


X 


2a2 


X 


a i + 2^2 


+ X 


2 a i +2^2 


+ X 


3a i +2^2 


+ •••) + 


)+ 

(1.4) 


A little thought now reveals that on the right hand side of (1.4), the number of 
times the term x 11 appears is the number of integral solutions (b i, £> 2 ) to (1.1) with 
m — 2; that is, h n is the coefficient of x 11 on the right hand side of (1.4). So H(x ) = 
(i-x a i)(i-x a 2 ) • Clearly, the same argument works for all m; thus we conclude that 


H (x) = 


(i -x a ')(i -x«2)...(i -x a ™y 


X < 1. 


(1.5) 


We now begin an analysis of H, as given in its closed form in (1.5), which will 
lead us to the asymptotic behavior as n -> 00 of the coefficients h n in its power 
series representation in (1.3). Consider the polynomial 


p(x) = (1 - x cn )(l - x a 2 ) ... (1 - x am ). 

For each k , the roots of 1 — x ajc are the a^ th roots of unity: {e a k }j k =0 . Clearly 1 is a 
root of /?(x) of multiplicity m. Because of the assumption that gcd(a\ , . . . , a m ) — 1, 

it follows that every other root of p(x) is of multiplicity less than m — that is, there is 

Mlk 

no complex number r that can be written in the form r — e a k , simultaneously for 
k = 1 , ,m, where l < jk < ak- Indeed, if r can be written in the above form 
for all k , then it follows that j*- is independent of k. In particular, ak — for 
k = 2, ... ,m. Since 1 < j\ < a 1 , it follows that there is at least one prime factor 
of a 1 which is a factor of all of the ak, k — 2, ... ,m, and this contradicts the 
assumption that gcd(< 2 i, . . . , a m ) = 1. 

Denote the distinct roots of p(x) by 1, r 2 , . . . , 77 , and note from above that 
| r y - 1 = 1, for all j . Let denote the multiplicity of the root for k = 2 
Also, note that /?(0) = 1. Then we can write 

(l-x ai )(\-x a2 )---(\-x am ) = (1 -x) m (l - -) m2 ---(l - (1.6) 

r l 

where 1 < m 7 < m, for j =2 

In light of (1.5) and (1.6), we can write the generating function //(.v) in the form 

1 

(1 - x) m (l - ^-) m2 •••(! — f-) mi ’ 


f/(x) = 


(1.7) 
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By the method of partial fractions, we can rewrite H from (1.7) in the form 

Ut \ ( An , Au , , \ , 

H{X) ~ ((T3^7 + + + (T^x)/ + 

( dll + . . . + A 2nn ) +■■■+( — + • • • + Alm ‘ ) 

\ (1 — — ) m2 ' ^ (1 ^ V ( l -*)™/ ( 1 -*)/' 

\ Y2 ' V Y2 ' V / \ / 


( 1 . 8 ) 


For positive integers k , the function F(x) = (1 — x) k has the power series 
expansion 


(1 -x)- k 


OO 


E 

77=0 


In + k — 1 

y k - 1 



To prove this, just verify that ^ ! ). Thus, the first term on the right 

hand side of (1.8) can be expanded as 


A 


OO 


11 


(1 - x) 


m 


= A n j:r +m -'h-. 


77 = 0 


m — 1 


(1.9) 


The coefficient of x 11 on the right hand side above is 


(n + m — \){n + m — 2) • • • (n + 1) w 

- 4 n 7 777 ~ 


777 — 1 


(m — 1)! 


(m — 1)! 


as n oo. 


Every other term on the right hand side of (1.8) is of the form . where 1 < 

V r ) 

k < m and \r\ = 1. By the same argument as above, the coefficient of x n in the 

a a k — 1 

expansion for ([ _x )k is asymptotic to as n -> oo (substitute f for x in the 

appropriate series expansion). Thus, each of these terms is on a smaller order than 
the coefficient of x n in (1.9). We thereby conclude that the coefficient of x 11 in H(x) 

ffi — 1 

is asymptotic to An as n -> oo. By (1.3), this gives 


h n ~ An 


n 


777 — 1 


(m — 1)! 


, as n — > oo. 


( 1 . 10 ) 


It remains to evaluate the constant An. From (1.8), it follows that 


H (x) 


A 


li 


(1 - x) 


777 


+ 0( 


1 


(1 -x) 


777 


— ), as x -> 1 


Thus, 


lim(l — x) m H(x) = A n . 


( 1 . 11 ) 
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But on the other hand, from (1.5), we have 


(1 -x) m H(x) = 


(1 - x) m 

(1 — x ai )(l — x a2 ) • • • (1 — x a,n ) 



x — 1 

X a J — 1 


(U2) 


Since (x a J)'\ x =i 


= <2jX a J 1 


x =i = a j, we conclude from (1.12) that 


lim(l -x) m H(x) = 


1 


n;=i 



(1.13) 


From (1.11) and (1.13) we obtain An — — , and thus from (1.10) we conclude 

l \.j = i a j 

, n m ~ 1 

that /z 77 ~ . □ 

(m— 1)! 11 7 = 1 a j 

Exercise 1.1. If b\a\ + ^ 2^2 + ••• + = ft, then of course bjCij < «, for 

all j G [m]. Use this to obtain the following rough upper bound on the number of 

777 ^ 

solutions to (1.1): < rw —, : - . Then use this estimate together with the third 

1 1; = 1 a j 

“fundamental result” in Appendix B to show that the series defining H(x) in (1.3) 
converges for \x\ < 1. 

Exercise 1.2. Go through the proof of Theorem 1.1 and convince yourself that the 
result of the theorem holds even if the integers {a j}J =l are not distinct. That is, 
the number of solutions to (1.1) is asymptotic to the expression on the right hand side 
of (1.2). Note though that the number of such solutions is not equal to P n {{aj } n j=i )• 
What is the leading asymptotic term as n -> 00 for the number of ways to make n 
cents out of quarters and pennies, where one distinguishes the quarters by their mint 
marks — “p” for Philadelphia, “d” for Denver, and “s” for San Francisco — but where 
the pennies are not distinguished? 

Exercise 1.3. In the case that d — gcd(a \, . . . , a m ) > 1, use Theorem 1.1 to 
formulate and prove a corresponding result. 


Exercise 1.4. A composition of n is an ordered sequence of positive integers 
(x \, . . . ,Xk), where k is a positive integer, such that Y^=i x i = n. A favorite 
method of combinatorists to calculate the size of some combinatorial object is to 
find a bijection between the object in question and some other object whose size 
is known. Let C n denote the number of compositions of n. To calculate C n , we 
construct a bijection as follows. Consider a sequence of n dots in a row. Between 
each pair of adjacent dots, choose to place or choose not to place a vertical line. 
Consider the set of all possible dot and line combinations. (For example, if /t = 5, 


here are two possible such combinations: (1) 


( 2 ) 


■): 


(a) Show that there are 2 /?_1 dot and line combinations. 

(b) Show that there is a bijection between the set of compositions of n and the set 
of dot and line combinations. 

(c) Conclude from (a) and (b) that C n — 2 n ~ l . 
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1 Partitions with Restricted Summands 


Exercise 1.5. Let C /7 ’ denote the number of compositions of n with summands 
restricted to the integers 1 and 2, that is, compositions (xi, • • • , Xk) of n with the 
restriction that x 7 - e {1,2}, for all i . The series 


F(x) 


1 

\ — x — X 2 


oo 

+ %2 y 

n= 0 


(1.14) 


converges absolutely for \x 

V5-1 . 

2 • 


< 


y/5~ 1 


since x + x^ 


< 


X + X 


< 1 if 


X 


< 


( 12 } 

(a) Similar to the argument leading from (1.3) to (1.5), argue that C 77 ’ is the 
coefficient of x 77 in the power series expansion of F . 

(b) Show that F(x) — fnX n , where {f n }^L 0 is the Fibonacci sequence — 

see (B.2) in Appendix B. (Hint: One has (x + x 2 )F(x) — F(x) — 1.) 

( 12 } 

(c) Conclude from (a) and (b) that C 77 ’ is the nth Fibonacci number; thus, 
from (B.10) in Appendix B, 


c„ {1 ’ 2} 




Chapter Notes 

For a leisurely and folksy introduction to the use of generating functions in 
combinatorics, see Wilf’s little book [34]. For a recent encyclopedic treatment, see 
the book of Flajolet and Sedgewick [20]. The asymptotic formula for P n , noted at 
the beginning of the chapter, was proved by Hardy and Ramanujan in [23]. For a 
modern account, see [4] . The asymptotic estimate in Theorem 1 . 1 is due to I. Schur. 
As noted in the text, this asymptotic formula also proves that (1.1) has a solution 
for all sufficiently large n. However, this latter fact can be proved more easily; see, 
for example, Brauer [11]. Given {cij }J = ] , what is the exact minimal value of no 
such that every n > no can be written in the form (1.1)7 When m — 2, the answer is 
(a\ — 1)(«2 — 1). A proof can be found in [34]. For m > 3 the answer is not known. 


Chapter 2 

The Asymptotic Density of Relatively Prime 
Pairs and of Square-Free Numbers 


Pick a positive integer at random. What is the probability of it being even? As 
stated, this question is not well posed, because there is no uniform probability 
measure on the set N of positive integers. However, what one can do is fix a 
positive integer n , and choose a number uniformly at random from the finite set 
[n\ = {1 , ,n}. Letting p n denote the probability that the chosen number was 
even, we have lim^oo p u — and we say that the asymptotic density of even 
numbers is equal to | . 

In this spirit, we ask: if one selects two positive integers at random, what is the 
probability that they are relatively prime ? Fixing n , we choose two positive integers 
uniformly at random from [n\. Of course, there are two natural ways to interpret this. 
Do we choose a number uniformly at random from [n] and then choose a second 
number uniformly at random from the remaining n — 1 integers, or, alternatively, 
do we select the second number again from [n], thereby allowing for doubles? The 
answer is that it doesn’t matter, because under the second alternative the probability 
of getting doubles is only and this doesn’t affect the asymptotic probability. Here 
is the theorem we will prove. 

Theorem 2.1. Choose two integers uniformly at random from [n\. As n —> oo, the 
asymptotic probability that they are relatively prime is ^ % 0.6079. 

We will give two very different proofs of Theorem 2.1: one completely number 
theoretic and one completely probabilistic. The number theoretic proof is elegant 
even a little magical. However, it does require the preparation of some basic number 
theoretic tools, and it provides little intuition. The number theoretic proof gives the 

asymptotic probability as “2 ) _1 - The well-known fact that Yl^=i jp — \ is 

proved in Appendix D. The probabilistic proof requires very little preparation; it is 
enough to know just the most rudimentary notions from discrete probability theory: 
probability space, event, and independence. A heuristic, non-rigorous version of 
the probabilistic proof provides a lot of intuition, some of which the reader might 
find obscured in the rigorous proof. The probabilistic proof gives the asymptotic 
probability as nr=,o- where {pk}^Li is an enumeration of the primes. One 
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2 Relatively Prime Pairs and Square-Free Numbers 


then must use the Euler product formula to show that this is equal to 

We will first give the number theoretic proof and then give the heuristic and the 

rigorous probabilistic proofs. 

The number theoretic ideas we develop along the way to our first proof of 
Theorem 2.1 will bring us close to proving another result, which we now describe. 
Every positive integer n > 2 can be factored uniquely as n — p\ l • • • , where 

m > 1, {pj YJ=\ are distinct primes, and kj £ N, for j £ [m\. If in this factorization, 
one has kj — 1, for all j £ [m\, then we say that n is square-free. Thus, an integer 
n > 2 is square-free if and only if it is of the form n — p\-- • p m , where m > 1 and 
{ Pj Yj=i are distinct primes. The integer 1 is also called square-free. There are 61 
square-free positive integers that are no greater than 100: 

1,2,3,5,6,7,10,11,13,14,15,17,19,21,22,23,26,29,30,31,33,34,35,37,38,39,41,42,43, 

46,47,51,53,55,57,58,59,61,62,65,66,67,69,70,71,73,74,77,78,79,82,83,85,86, 

87,89,91,93,94,95,97. 


Let C n — {k : 1 < k < n, k is square-free}. If lim„. 


\C r 


■OO 


n 


exists, we call 


this limit the asymptotic density of square-free numbers. After giving the number 
theoretic proof of Theorem 2.1, we will prove the following theorem. 


Theorem 2.2. The asymptotic density of square -free integers is 




7T" 


0.6079. 


For the number theoretic proof of Theorem 2.1, the first alternative suggested 
above in the second paragraph of this chapter will be more convenient. In fact, once 
we have chosen the two distinct integers, it will be convenient to order them by size; 
thus, we may consider the set B n of all possible (and equally likely) outcomes to be 


B„ = {( j,k ) : 1 < j < k < n}. 


Let A n C B n denote those pairs which are relatively prime: 

A n = {( j,k ) : 1 < j < k < n, gcd (j,k) = 1}. 


Then the probability q n that the two selected integers are relatively prime is 



\A „ | _ 2\A„\ 

\B n | «(«-!)' 


( 2 . 1 ) 


We proceed to develop a circle of ideas that will facilitate the calculation of 
lim^-^oo q n and thus give a proof of Theorem 2.1. A function a : N -> R is called 
an arithmetic function. The Mobius function pt is the arithmetic function defined by 


1 , if n — 1 ; 


H(n) = 


(— l) m , if n — UU Pj > where {pj}j =l are distinct primes; 


0, otherwise. 

V 


Thus, for example, we have /x(3) = — 1 , /x(15) = 1, and /x(12) = 0. 
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Given arithmetic functions a and b , we define their convolution a * b to be the 
arithmetic function satisfying 


(i a * b){n) — Y a(d)b(—), n e N. 

— a 

d\n 


Clearly, a * b — b * a. The convolution arises naturally in the following context. 
Define formally 


f(x) = E 

n = 1 


a(n) 

n x 


and 


zo = E 

77 — 1 


b(n) 

n x 


( 2 . 2 ) 


(2.3) 


When we say “formally,” what we mean is that we ignore questions of convergence 
and manipulate these infinite series according to the laws of addition, subtraction, 
multiplication, and division, which are valid for series with a finite number of terms 
and for absolutely convergent infinite series. Their formal product is given by 


00 a (d) w ~ i(Jfc) 


/(*)*(*)=(£ ^ )(Ef4=E 


00 a(d)b(k ) 00 


d x 

d = 1 k = 1 




X 


= Ei E 


n 

/7 = 1 d,k:dk=n 


^ kn\ur n \ (a * b)(n) 

77 = 1 7 / 1 77 77 = 1 


(2.4) 


If the series on the right hand side of (2.2) and (2.3) are in fact absolutely convergent, 
then the series on the right hand side of (2.4) is also absolutely convergent. In 

such case, the equality ( Ydd=\ 1 ipr) = T,T=i (a *^ {n) is a rigorous 

statement in mathematical analysis. 

An arithmetic function a is called multiplicative if a (n m) — a (n)a (m) whenever 
gcd (n,m) = 1. It follows that if a ^ 0 is multiplicative, then 1) = 1. If a ^ Ois 
multiplicative, then it is completely determined by its values on the prime powers; 

indeed, if n — YYj=\ P ] J the factorization of n into a product of distinct prime 

powers, then a(n) = a ( n'”= i Pj) = 117=1 a (p k /)- 

It is trivial to verify that pt is multiplicative. For the first proposition below, the 
following lemma will be useful. 

Lemma 2 . 1 . The arithmetic function y d \„ pt(d) is multiplicative. 

Proof. Let n and m be positive integers satisfying gcd (n,m) — 1. We have 


10 
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r l l (dp = T. n{di)fi(d 2 )= ^2 ii(d\d 2 ) = r. I l (d), 

d\ \n chlm d \ \nxh\fn d\ \nxl2\m d\nm 


where the second equality follows from the fact that pt is multiplicative and the fact 
that if gcd (n,m) = 1 , d\ \n and d 2 \m, then gcd(<ii , di) — 1 , while the final equality 
follows from the fact that if gcd (n,m) — 1 and d\nm , then d can be written as 
d — d\d 2 for a unique pair d\,d 2 satisfying d\\n and dfipn. (The reader should 
verify these facts.) □ 

We introduce three more arithmetic functions that will be used in the sequel: 


1 (n) — 1, for all n\ i(n ) = n, for all n; e(n ) = < 

( 0, otherwise. 

Note that a * e — a, for all a, and that (< a * 1 ){n) — J2d\ n a (d). A key result we 
need is the Mobius inversion formula. 

Proposition 2.1. Let a be an arithmetic function. Define b = a * 1. Then a — b*/!. 
Remark. Written out explicitly, the theorem asserts that if 

b(n ) := 'y^ a(d), 

d\n 


then a(n) = J2d\n b ( d )ld'j )• 

Proof. To prove the proposition, it suffices to prove that 

1 * fi = e. (2.5) 

Indeed, using this along with the easily verified associativity of the convolution, we 
have 


b*fi = (a *\)*ii = a*{\*ti) = a*e = a. 
We now prove (2.5). We have 

(1 * li)(n) — (fi * 1 )(n) = ii(d). 

d\n 


By Lemma 2.1, the function fi(d) is multiplicative. Clearly, the function e 

is multiplicative. Obviously, e(l) — 1 and e(p k ) = 0, for any prime p and any 
positive integer k. We have J2d\\l l (d) = fi( 1) = 1. Thus, since a nonzero, 
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multiplicative, arithmetic function is completely determined by its values on prime 
powers, to complete the proof that 1 * /x = e, it suffices to show that J2d\ P k l^id) 

= 0. We have Y.d\ P k = T!j = o P j ) = M(U + P-(p) =1-1=0. □ 

We introduce one final arithmetic function — the well-known Euler 0 -function: 

<p(n ) = \{j : 1 < j < n, gcd (J,n) = 1}|. 

That is, (p{n ) counts the number of positive integers less than or equal to n which 
are relatively prime to n. For our calculation of lim^oo q n , we will use a result that 
is a corollary of the following proposition. 

Proposition 2.2. 0*1 — i; that is, 

= n. 

d I n 


From Proposition 2.2 and the Mobius inversion formula, the following corollary 
is immediate. 

Corollary 2.1. /x * i — 0; that is, 

E n 

d\n 

Remark. For the proofs of Theorems 2.1 and 2.2, we do not need Proposition 2.2, 
but only Corollary 2.1. In Exercise 2.1, the reader is guided through a direct proof of 
the corollary. The proof also will reveal why the seemingly strange Mobius function 
has such nice properties. 

Proof of Proposition 2.2. Let d \n. It is easy to see that is equal to the number 
of k e [n] satisfying gcd (k,n) — Indeed, k e [n] satisfies gcd (k,n) = | if and 
only if k = j(^), for some j £ [d] satisfying gcd (d, j ) = 1. (The reader should 
verify this.) Also, clearly, every k £ [n] satisfies gcd (k, n) — ^ for some d\n. The 
proposition follows from these facts. □ 

Remark. For an alternative proof of Proposition 2.2, exactly in the spirit of 
Lemma 2.1 and the proof of (2.5), see Exercise 2.2. 

We are now in a position to prove Theorem 2.1. 

Number Theoretic Proof of Theorem 2.1. For each k > 2, there are f(k) integers j 
satisfying i < j < k and gcd(y, k) — 1. Thus, 


14.1 = |{(M) : 1 < j < k < n, gcd (j,k) = 1}| = p(k ). 

k= 2 
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Therefore, from (2.1), we have 


q n = 


2 E/t= 2^(0 

— 1) 


To calculate 


2 V" 0(£) 

hrn <?„ = hrn ^ , 

77— >oo 77 ->oo n(fl — 1 ) 


(2.6) 


we analyze the behavior of the sum <P(k) for large ft 
Remark. The function 0 can be written explicitly as 


<p(n) = n ]~[(1 --),«> 2, 


(2.7) 


where n^|« indicates that the product is over all primes that divide ft; see 
Exercise 2.3. However, this formula is of no help whatsoever for analyzing the above 
sum. 

We will use Corollary 2.1 to analyze Ylk=i $(£)• From Corollary 2.1, we have 


n 


n 


n 


* 0(0 = 

/c= 1 /:= 1 /c= 1 


77 


/? 


E E d'n{d) = Y,V'(d) E <*'■ 


k= 1 dd'=k 


d = 1 «/'<* 


Since ET=i J — \m(m + 1), we have 


77 77 . 77 

E*(*> = E^> E ^ = 2 E^^yWy] + !)• 

7/ = 1 


/c = 1 


J=1 «"<f 


( 2 . 8 ) 


We have [f]([f] + 1) < f (f + 1) = + f, and [f]([f] + 1) > (f - l)f = 

(f) 2 -f;thus, 


ft ~ ft ft ft n ft ~ n 

( 7 y- - 7 < [ 7 }([ 7 ] + 1) < ( 7 y- + 7 

a a a a da 


Substituting this two-sided inequality in (2.8), we obtain 


ft 


2 n 


2 ^ d 2 

d = 1 


2 d 

d = 1 


/c = l 


TL^r-oLYr-LW-TL^r + ^L— < 2 - 9 > 


d = 1 


i 2 


2 <i 

d = 1 
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Now 


d = 1 d = 1 d= 2 


( 2 . 10 ) 


since the final sum is a lower Riemann sum for 2 dx. From (2.9) and (2.10), we 
obtain 


lim 

77—^00 


n(n — 1) 


1 y, ntf) 

2 ^~d r ' 

d = 1 


( 2 . 11 ) 


It remains to evaluate YlT=i ^r- On th e face of it, from the definition of /x, 
it would seem very difficult to evaluate this explicitly. However, Mobius inversion 
saves the day. Consider (2.2)-(2.4) with a — 1 and b — fi and with x = 2. With 
these choices, the right hand sides of (2.2) and (2.3) are absolutely convergent. 
By (2.5), we have 1 * fi — e\ that is, a * b — e. Therefore, we conclude from 
(2.2)-(2.4) that 



Recall the well-known formula 



77 — 1 


( 2 . 12 ) 


(2.13) 


We give a completely elementary proof of this fact in Appendix D. From (2.12) 
and (2.13) we obtain 


^ d 2 



Using (2.14) with (2.11) and (2.6) gives 


lim q n = 

77— >00 



(2.14) 


completing the proof of the theorem. □ 

Remark. If a is an arithmetic function and / is a nondecreasing function, we 
say that the function / is the average order of the arithmetic function a if 
n Ylk = i a (k) — f( n ) + °(f ( n ))• Of course this doesn’t uniquely define / ; we 
usually choose a particular such / which has a simple form. From (2.1 1) and (2.14), 
it follows that the average order of the Euler 0-function is . 
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We now turn to Theorem 2.2. 

Proof of Theorem 2.2. From the definition of the Mobius function, it follows that 


9 , l 1, if n is square-free; 

p («) = , 

( 0, otherwise. 

Thus, letting 

A n — {j e [n\ : j is square-free}, 


(2.15) 


we have 


M„| = X^ 2 0')- 

j = i 


To prove the theorem, we need to show that 


lim 

n^oo 




(2.16) 


(2.17) 


We need the following lemma. 

Lemma 2.2. 


^ 2 («) = X^)- 

k 2 \n 

Proof Let A (n) M(^)- If n is square-free, then the only integer k that 

satisfies k 2 \n is k — 1. Thus, since /z(l) = 1, we have A (n) — 1. On the other 
hand, if n is not square-free, then n can be written in the form n — m 2 l , where 
m > 1 and / is square-free. Now k 2 \m 2 l if and only if k\m. (The reader should 
verify this.) Thus, we have 

A (») = X^) = X = X^(*> = = o, 

£ 2 |« k 2 \m 2 l k\m 

where the last equality follows from (2.5). The lemma now follows from (2.15). □ 
Using Lemma 2.2, we have 

n n 

X^ 2 (i) = xx^)- 

.7=1 i =1 U|j 


(2.18) 


2 Relatively Prime Pairs and Square-Free Numbers 


15 


If k 2 > n , then fi{k) will not appear on the right hand side of (2.18). If k 2 < n , 
then fi{k) will appear on the right hand side of (2.18) [^V] times, namely, when 
j — k 2 , 2k 2 , . . . , [fi]k 2 . Thus, we have 


n 


n 


X m 2 0') = X X = X ^ 

j = 1 7 — 1 ^ 2 1 7 k 2 <n 


E ipi « = 

k < [« 2 ] 


" E p? + E (Ipi - £)*»■ 
/:<[/? 2] ^<[« 2 ] 


(2.19) 


Since each summand in the second term on the right hand side of (2.19) is bounded 
in absolute value by 1, we have 


E (ipi 

k < [/? 2 ] 



( 2 . 20 ) 


It follows from (2.16), (2.19), and (2.20) that 


lim 

n^oo 



f 

^ k 2 


Using this with (2.14) gives (2.17) and completes the proof of the theorem. □ 

We now give a heuristic probabilistic proof and a rigorous probabilistic proof of 
Theorem 2.1. In the heuristic proof, we put quotation marks around the steps that 
are not rigorous. 

Heuristic Probabilistic Proof of Theorem 2.1. Let {pk}kL[ be an enumeration of 
the primes. In the spirit described in the first paragraph of the chapter, if we 
pick a positive integer “at random,” then the “probability” of it being divisible by 
the prime number pk is 2-. (Of course, this is true also with p^ replaced by an 
arbitrary positive integer.) If we pick two positive integers “independently,” then the 
“probability” that they are both divisible by pk is — 2_ — -L ? by “independence.” 

Pk Pk f) 

So the “probability” that at least one of them is not divisible by pk is 1 — \. The 

Pk 

“probability” that a “randomly” selected positive integer is divisible by the two 
distinct primes, pj and pk , is — 2-2-. (The reader should check that this 

“holds” more generally if pj and pk are replaced by an arbitrary pair of relatively 
prime positive integers, but not otherwise.) Thus, the events of being divisible by pj 
and being divisible by pk are “independent.” Now two “randomly” selected positive 
integers are relatively prime if and only if, for every k , at least one of the integers 
is not divisible by pk. But since the “probability” that at least one of them is not 
divisible by is 1 — and since being divisible by a prime pj and being divisible 
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by a different prime p k are “independent” events, the “probability” that the two 
“randomly” selected positive integers are such that, for every k , at least one of them 
is not divisible by pk is ]~[^Li(l — r)« Thus, this should be the “probability” that 

Pk 

two “randomly” selected positive integers are relatively prime. □ 

Rigorous Probabilistic Proof of Theorem 2.1. For the probabilistic proof, the sec- 
ond alternative suggested in the second paragraph of the chapter will be more 
convenient. Thus, we choose an integer from [n\ uniformly at random and then 
choose a second integer from [n] uniformly at random. Let £2 n — [n]. The 
appropriate probability space on which to analyze the model described above is the 
space (£2 n x Q n , P n ), where the probability measure P n on Q n x £2 n is the uniform 
measure; that is, P n (A ) = for any A C Q n x Q n . The point ( i , j) e Q n x Q n 
indicates that the integer i was chosen the first time and the integer j was chosen 
the second time. Let C n denote the event that the two selected integers are relatively 
prime; that is, 


Cn = {(L j) e Q n X Q n : gcd (/, j) = 1}. 


Then the probability q n that the two selected integers are relatively prime is 


n — PniCf) — 



n 


2 ' 


Let {p^'fLi denote the prime numbers arranged in increasing order. (Any 
enumeration of the primes would do, but for the proof it is more convenient to 
choose the increasing enumeration.) For each k e N, let B\ h denote the event that 

it 5 #v 

the first integer chosen is divisible by pk and let B 2 . k denote the event that the 
second integer chosen is divisible by p k - That is, 


B l-k = iO’j) e ton x to n : p k \i }, 


B n\k ~ {(Ty) ^ to n X Q n • Pk\j}‘ 


Note of course that the above sets are empty if p k > n. The event B\, n B\ h — 

ftj /v # t j /v 

{(/, j) e Q n x to n : PkV an< i Pk\j} is the event that both selected integers have 
p k as a factor. There are [-^] integers in Q n that are divisible by p k , namely, 

Pk, 2 p k ,--- , [y^Pk- Thus, there are [y ] 2 pairs (/, j) e x for which both 
coordinates are divisible by p k ; therefore, 


Pn(B' 


n;k n B n;k 


) _ v Pk2 


n 


( 2 . 21 ) 


Note that U f =l {B x n . k n B^. k ) = U n k=l (B l n;k n B 2 n . k ) is the event that the two 
selected integers have at least one common prime factor. (The equality above 
follows from the fact that B*. k and B 2 . k are clearly empty for k > n.) Consequently, 
C n can be expressed as 
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( uju (B l n;k n B 2 n . k ))' 


— n /7 

— 1 'it=i 




where := £2 n x £2 n — A denotes the complement of an event A c £2 n x 
Thus, 


B„(C„) = B„(n£ =1 (b*.* n bIzY). 


( 2 . 22 ) 


Let R < w be a positive integer. We have 

n k=i( B n-,k n = n« =l (B' ; , n B 2 t ) c - n B 2 . k ) 

and, of course, ^ n k=l {B x n . k Cl B 2 . k ) c C n* =1 (B,| ;it n B 2 . k ) c . Thus, 

Pn( nf =1 (5^ n B 2 . k y ) - />„( ujU +I (B‘ ; , n B 2 .,)) < 

Pn{ n?= i (2*‘* n B 2 k y) < b„( n* =1 (B> n b 2 k y). 


(2.23) 


Using the sub-additivity property of probability measures for the first inequality 
below, and using (2.21) for the equality below, we have 


M uZ=*+i (<* n b, 


n 


n 


„;,))< £ P n (B l n;k n B 2 . k )) = 

k = R + 1 


E 

A: = R + 1 


[ — I" 

L Pk J 

/l 2 


oo 


^ E 


k = R + 1 


Pk 


(2.24) 


Up until now, we have made no assumption on w. Now assume that pk \n, for 

g 

k — 1, • • • , R\ that is, assume that n is a multiple of [\ k = l Pk' Denote the set of 
such n by D R ; that is, 


Dr — {n e N : for k — 1, • • • , R}. 


Recall that the event B*. k D B%. k is the event that both selected integers are divisible 

by k. We claim that if n e D R , then the events {B^. k D B%. k } k=l are independent. 
That is, for any subset I C {1 , 2, • • • , R }, one has 


Pn( n* e / n B 2 ,)) = n B„(B' ;/f n B 2 . k ), if n e D r . (2.25) 

kel 

The proof of (2.25) is a straightforward counting exercise and is left as Exercise 2.4. 
If events {A k } k=l are independent, then the complementary events {A c k } k=[ are also 
independent. See Exercise A. 3 in Appendix A. Thus, we conclude that 


R 


Pn{ n* = i (B l n;k n B 2 . k y) = f] b„((b' ; , n b 2 ,) c ), if* e d*. 


/c=l 


(2.26) 
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By (2.21) we have P n ((B l n;k n B^.j) = 1 - P n {B\. k n B^) = 1 - 
ft. Thus, from the definition of D R , we have 


[ — ] 2 

, for any 


Pn {(Bl, k n B n 2 .*) c ) = 1 - 2 _ , if „ e D*. (2.27) 

Pk 

From (2.22) to (2.24), (2.26), and (2.27), we conclude that 


R i oo i i 

no-- 7)- E — - P ''(Cn) < no-- 7), for/? e N and n e £>*. (2.28) 

fc=l Pk k=R+\ Pk k = 1 Pk 


We now use (2.28) to obtain an estimate on P n (C n ) for general ft. Let ft > 

_ 

n, =1 Pk • Let ft denote the largest integer in D# which is smaller or equal to ft, 
and let n" denote the smallest integer in Dr which is larger or equal to ft. Since Dr 
is the set of positive multiples of i|^=i Pk , we obviously have 

R R 

ft > ft _ n Pk an< ^ n " k n r~ [ P k ' (2.29) 

/:= 1 /c= 1 

For any ft, note that n 2 P n (C n ) — \C n \ is the number of pairs (/, j ) G £2 n x £2 /7 that 
are relatively prime. Obviously, the number of such pairs is increasing in ft. Thus 
(n') 2 P n f(C n f) < n 2 P n (C n ) < (n rr ) 2 P n "(C n "), or equivalently, 

(-) 2 P n dC n ') < P„(C„) < ( — ) 2 /V(<»- (2-30) 

ft ft 

Since n' , ft" G we conclude from (2.28)-(2.30) that 


R 


( "-n^M no-V E 4)< f "( c "X( w+n " =lP b 2 rio-4)- 

ft 11 ftf ft, ft 1 1 ftf 

k= 1 F k k=R + 1 k = 1 

(2.31) 




Letting ft -> oo in (2.31), we obtain 






FT (1 — L) — E “2 - liminf r’nCO < limsup P„(C„) < ff(l - L). 

^ Pk ™ ™ /=, Pi 

(2.32) 


/c=l 


Now (2.32) holds for arbitrary R\ thus letting R -> oo, we conclude that 


lim P„(C n ) = F[(l - A)- 

— A Pjt 


/ 7->00 


/c= 1 


(2.33) 
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The celebrated Euler product formula states that 



oo 
k = 1 


1 

(i 




see Exercise 2.5. From (2.33), (2.34), and (2.13), we conclude that 


lim q„ = lim P„(C n ) 

77— >oo n—>oo 


1 

E OO J_ 
77 — 1 n 2 



(2.34) 


□ 


Exercise 2.1. Give a direct proof of Corollary 2.1. (Hint: The Euler 0-function 
cj)(n) counts the number of positive integers that are less than or equal to n and 
relatively prime to n. We employ the sieve method , which from the point of view 
of set theory is the method of inclusion-exclusion. Start with a list of all n integers 
between 1 and n as potential members of the set of the f(n) relatively prime integers 
to n. Let {pj} n ] = i be the prime divisors of n. For any such pj , the — numbers 

Pj , 2pj , . . . , j- pj are not relatively prime to n. So we should strike these numbers 
from our list. When we do this for each j , the remaining numbers on the list are 
those numbers that are relatively prime to n , and the size of the list is f(n). Now 
we haven’t necessarily reduced the size of our list to N\ n — Y^= i because 
some of the numbers we have deleted might be multiples of two different primes, 
Pi and pj , in which case they were subtracted above twice. Thus we need to add 
back to N\ all of the — ^ multiples of pi pj , for i ^ j . That is, we now have 

Pi P j 

No := N\ + V /= z ; - n —. Continue in this vein. 

z 1 PiPj 

Exercise 2.2. This exercise presents an alternative proof to Proposition 2.2: 

(a) Show that the arithmetic function f(d) is multiplicative. Use the fact that 
0 is multiplicative — see Exercise 2.3. 

(b) Show that Jf d \ n f(d) — n , when n is a prime power. 

(c) Conclude that Proposition 2.2 holds. 

Exercise 2.3. The Chinese remainder theorem states that if n and m are relatively 
prime positive integers, and a £ [n\andb £ [m], then there exists a unique c £ [nm] 
such that c — a mod n and c — b mod m. (For a proof, see [27].) Use this to prove 
that the Euler 0 -function is multiplicative. Then use the fact that 0 is multiplicative 
to prove (2.7). 

Exercise 2.4. Prove (2.25). 

Exercise 2.5. Prove the Euler product formula (2.34). (Hint: Let Nt denote the set 
of positive integers all of whose prime factors are in the set { Pk}p=i • Using the fact 
that 


1 - ^ 


Pk 


E 

777=0 


1 

p r r 
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for all k e N, first show that — V— V = ~r , and then show that 

’ 1 — -L i — y ^nej\2 n r ’ 

nLi irx = E„eyv £ for any £ € N.) 

p r k 

Exercise 2.6. Using Theorem 2.1, prove the following result: Let 2 < d e N. 
Choose two integers uniformly at random from [n]. As n -> oo, the asymptotic 
probability that their greatest common divisor is d is -^2 • 

Exercise 2.7. Give a probabilistic proof of Theorem 2.2. 


Chapter Notes 

It seems that Theorem 2.1 was first proven by E. Cesaro in 1881. A good source for 
the results in this chapter is Nathanson’s book [27]. See also the more advanced 
treatment of Tenenbaum [33], which contains many interesting and nontrivial 
exercises. The heuristic probabilistic proof of Theorem 2.1 is well known and 
can be found readily, including via a Google- search. I am unaware of a rigorous 
probabilistic proof in the literature. 


Chapter 3 

A One-Dimensional Probabilistic Packing 
Problem 


Consider n molecules lined up in a row. From among the n — 1 nearest neighbor 
pairs, select one pair at random and “bond” the two molecules together. Now from 
all the remaining nearest neighbor pairs, select one pair at random and bond the 
two molecules together. Continue like this until no nearest neighbor pairs remain. 
Let M n; 2 denote the random variable that counts the number of bonded molecules. 
Let EM n] 2 denote the expected value of M n - 2 , that is, the average number of bonded 
molecules. The first thing we would like to do is to compute the limiting average 
fraction of bonded molecules: liim-^oo EMn :2 . Then we would like to show that 
is close to this limiting average with high probability as n -> oo; that is, we would 
like to prove that satisfies the weak law of large numbers. 

Of course, by definition, EM n;2 = X!”=o J p ( M n -2 = j), where P(M n;2 = y ) is 
the probability that M n]2 is equal to j . However, it would be fruitless to pursue this 
formula to evaluate EM n;2 asymptotically because the calculation of P(M n;2 — j) 
is hopelessly complicated. We will solve the problem with the help of generating 
functions. 

Actually, we will consider a slightly more general problem, where the pairs are 
replaced by k-tuples, for some k > 2. So the problem is as follows. There are n 
molecules on a line. From among the n — k + 1 nearest neighbor k -tuples, select one 
at random and “bond” the k molecules together. Now from among all the remaining 
nearest neighbor k -tuples, select one at random and bond the k molecules together. 
Continue like this until there are no nearest neighbor ^-tuples left. Let M n -k denote 
the random variable that counts the number of bonded molecules, and let EM n; k 
denote the expected value of M n -^. See Fig. 3.1. Here is our result. 

Theorem 3.1. For each integer k >2, 

lim — = k exp (—2 — ) / exp(2Y^ — )ds\— pk . (3.1) 

n “ J Jo ~ j 

Furthermore, satisfies the weak law of large numbers; that is, for all e > 0, 


R.G. Pinsky, Problems from the Discrete to the Continuous , Universitext, 

DOI 10. 1007/978-3-3 19-07965-3_3, © Springer International Publishing Switzerland 2014 


21 


22 


3 Probabilistic Packing Problem 
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Fig. 3.1 A realization with n = 21 and k = 3 that gives M213 = 15 


lim P(| 

>00 


M n \k 

n 


Pk\>t) = Q- 


(3.2) 


Remark 1. Only when k — 2 can p k be calculated explicitly, one obtains P 2 — 
\—e~ 2 ^ 0.865. Numerical integration gives p 3 ^ 0.824, p 4 ^ 0.804, p 5 ^ 0.792, 
p 10 & 0.770, /?ioo & 0.750, /Aooo ^ 0.748, and />io,ooo — 0.748. The expression 
Pk seems surprisingly difficult to analyze. We suggest the following open problem 
to the reader. 

Open Problem. Prove that pk is monotone decreasing and calculate lim^oo pk. 

Remark 2. Any molecule that remains unbonded at the end of the nearest neighbor 
k -tuple bonding process occurs in a maximal row of j unbonded molecules, for 
some j e [k — 1]. In the limit as n -> 00, what fraction of molecules ends up in a 
maximal row of j unbounded molecules? See Exercise 3.2. (In Fig. 3.1, numbering 
from left to right, molecules #4 and #8 occur in a maximal row of one unbounded 
molecule, while molecules #15, #16, #20, and #21 occur in a maximal row of two 
unbounded molecules.) 

Proof. For notational convenience, let Hj\ k) — EM n -k and L^ k) — EM 2 . k . To prove 
the theorem, it suffices to show that 


EM n - k = Hp = p k n + o(n), as n -* oo, 


(3.3) 


and that 


EM n-k = L n ] = pW + °(« 2 ). as tl — > OO. 


(3.4) 


This method of proof is known as the second moment method. It is clear that (3.1) 
follows from (3.3). An application of Chebyshev’s inequality shows that (3.2) 
follows from (3.3) and (3.4). To see this, note that if Z is a random variable with 
expected value EZ and variance cr 2 (Z), then Chebyshev’s inequality states that 


P(\z 


EZ\ >8)< 


v 2 (Z) 

s 2 


for any 8 > 0. 


Also, cr 2 (Z) = EZ 2 — (EZ) 2 . We apply Chebyshev’s inequality with Z = 
Using (3.3) and (3.4), we have 



EZ = 


n 


— pk + o( 1), as n 00 , 


(3.5) 
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and 

t (k) /rr(k)\2 

a 2 (Z) = — — = pl+ o( 1 ) - (pk + o( l)) 2 = o(l), as n -> oo. 

ft 2 ft 2 

Thus, we obtain for all 8 > 0, 



as n 


oo, 


or, equivalently, 


lim P(| 

>oo 


M n; k 

n 


H ^ 

— — | > 5) = 0, for all 5 > 0. 
n 


(3.6) 


We now show that (3.2) follows from (3.3) and (3.6). Fix e > 0. We have 


M n ;k 

n 



Mrr 


n'k 


H 


(k) 


n 


+ 


u 


(k) 


n 


n 


n 


n 


Pk 


< 




n'k 


H, 


(k) 


n 


n 


n 


+ 


n 


(k) 


n 


n 


Pk 


i H {k) i „ 

For sufficiently large n € , one has from (3.3) that | pk \ < |, for n > n € . Thus, 


for n > a necessary condition for 
Consequently, 





> € is that 


M n \k 

n 



n 


> 


6 

2* 




> 0 < P { | 


Myifi 

n 



n 


> -), for w 


Now (3.2) follows from this and (3.6). 

Our proofs of (3.3) and (3.4) will follow similar lines. Before commencing with 
the proof of (3.3), we trace its general architecture. Only the first step of the proof 
involves probability. In this step, we employ probabilistic reasoning to produce a 
recursion equation that gives H^ k) in terms of H^\ H^\ . . . , H ^_} k . In this form, 
the equation is not useful because as n -> oo, it gives H,\ k) in terms of a growing 
number of its predecessors. However, defining sj k) — J2’j= o \ and using the 

abovementioned recursion equation, we find that S„ k) is given in terms of only 
two of its predecessors. We then construct the generating function g(t ) whose 
coefficients are {S^}^ L 0 . Using the recursion equation for S ( n k \ we show that g 
solves a linear, first order differential equation. We solve this differential equation 
to obtain an explicit formula for g(t). This explicit formula reveals that g possesses 

s (k) 

a singularity at t = 1. Exploiting this singularity allows us to evaluate lim^oo -V, 

rr(k) Jk ) 

and then a simple observation allows us to obtain lim^oo — ^ - from lim^oo ^ r • 
We now commence with the proof of (3.3). Note that if we start with n < k 
molecules, then none of them will get bonded. Thus, 
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H^ k) = 0, for n = 0, . . . , k — 1. (3.7) 

We now derive a recursion relation for H„ k \ The method we use is called /i/'.yf step 
analysis. We begin with a line of n > k unbonded molecules, and in the first step, 
one of the nearest neighbor k -tuples is chosen at random and its k molecules are 
bonded. In order from left to right, denote the original n — k + 1 nearest neighbor 
A- -tuples by {Bj} n ~J + 1 . If Bj was chosen in the first step, then the original row now 
contains a row of j — 1 unbonded molecules to the left of the bonded k -tuple Bj 
and a row of n + 1 — j — k unbonded molecules to the right of Bj . To complete the 
random bonding process, we choose random k -tuples from these two sub-rows until 
there are no more k -tuples to choose from. This gives us the following formula for 
the conditional expectation of M n -k given that Bj was selected first: for n > k. 


E(M n;k \Bj selected first) = k +E(Mj- Uk +M„ +l -j- k;k )=k+ H {k \ + ._ k . 


(k) 


'J 


(3.8) 


Of course, for each j e [n — k + 1], the probability that Bj was chosen first is 
— l~rr . Thus, we obtain the formula 

n—k + 1 ’ 


n —k + 1 


EM n; k = H } \ k) — ^ selected first) E(M n -^ \ Bj selected first) = 


y=i 


n —k + 1 


1— Y. 

7=1 


n — k + 


We can rewrite this as 


77— k 


=k + 


n —k + 


-Th 

1 ^ 


(k) 


n > k. 


(3.9) 


7=0 


The above recursion equation is not useful directly because it gives H„ k) in terms 
of n — k + 1 of its predecessors; we want a recursion equation that expresses a given 
term in terms of a fixed finite number of its predecessors. To that end, we define 

^ = X>f • ^ 3 - 10 ) 

7=o 


Substituting this in (3.9) gives 


c (k) 

^ n-k ’ 


n > k. 


= k + 


n — k + 1 


(3.11) 
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Writing (3.7) and (3.11) in terms of {S^}^: L 0 , we obtain 

Sji k) = 0, for n = 0, ...,£- 1, (3.12) 

and 

Sf - = k + n _l +l S ( X n > k. (3.13) 

This recursion equation has the potential to be useful since it gives s{ k) in terms of 
only two of its predecessors — S^2 X and S^ k \ • Of course, we have paid a price — we 
are now working with S„ k) instead of ; but this will be dealt with easily. For 
convenience, we drop the superscript k from S } [ k \ H„ k \ and for the rest of the 
chapter, except in the statement of propositions. We rewrite (3.13) as 

( yi — k - F 1)<S ;7 — ( Tt — k - F 1 ) ^ 77 — i -p 2S n —^ T k{ji — k T 1), ti -2: k. (3.14) 

We now define the generating function for {S n }^L 0 and use (3.14) to derive a 
linear, first-order differential equation that is satisfied by this generating function. 
The generating function g(t) is defined by 

oo oo 

g(t) = J2 S nt* = J2 S " tn ’ (3 ' 15) 

n=0 n=k 

where the second equality follows from (3.12). From the definitions, it follows that 
H n < n , and thus S n < ^n(n + 1). Consequently, the sum on the right hand side 
of (3.15) converges for | ^ | < 1 , with the convergence being uniform for \t\ < p, for 
any p £ (0, 1). It follows then that 


oo 

g'(t) = ^2 nS„t n ~ l 

n=k 


t < 1 . 


(3.16) 


Multiply equation (3.14) by t n and group the terms in the following way: 

nS n t n — (k — 1 )S n t n — (n — 1 )S n -\t n — (k — 2)S n -\t u + 2 S n -^t n + k(n — k + 1 )t n . 

Now summing the equation over all n > k, and appealing to (3.15), (3.16), 
and (3.12), we obtain the differential equation 

tg'(t) -(k- l)g(t) = t 2 g'(t) -(k- 2 )tg(t) 

OO OO 

+ 2 t k g(t) + kt — k(k — 1 ) t n • (3.17) 

n=k n=k 
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Since YlT=k ntn 1 * s derivative of , it follows that 

YlT=k n t u ~ l — — (1 ~ r |i- r > 2 +t • Using these facts and doing some algebra, 

which leads to many cancelations, we obtain 





oo 

n=k 


kt k 

o^y' 


(3.18) 


Substituting this in (3.17), and doing a little algebra, we obtain 


. (k — 1) — (k — 2)t + 2t k kt k 1 

g (0 = zsi — g( 0 + — — — , 0<t<l. 


t( 1 ~t) 


(i-0 : 


(3.19) 


Note that we have excluded t — 0 because we have divided by t. 

There are two singularities in the above equation — one at t — 0 and one at t — 1 . 
The singularity at t — 0 is removable; indeed, g(0) = 0 so the first term on the right 
hand side of (3.19) can be defined at 0. The singularity at 1, on the other hand, is 
authentic, and actually contains the solution to our problem — we will just need to 
“unzip” it. 

The linear, first-order differential equation in (3.19) is written in the form g\t ) = 
d(t)g(t) + b(t ), where 



(k — 1) — (k — 2 )t + 2 t k 



kt k 1 

(T^Tp- 


(3.20) 


Let e G (0, 1) and rewrite the differential equation as 

(g(t)e-^ a(s)ds y = b{t)e~k a(s)ds . 


Integrating from e to t G (e, 1) gives 


g(t)e~k a{r)dr = g(e) + J b(s)e~k a ^ dr ds, t € (e, 1), 


which we rewrite as 


/ t 

b{s)e^ s a b) dr ds, t g (€, 1). (3.21) 


k—~ 

Since lim^o^OO — k — 1, there exists a to > 0 such that a(t) < — p-, for 
0 < t < to. Thus, for e < to, one has 
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By (3.15) we have g(6) = 0(e k ) as e -> 0. Therefore, 


lim g(e)e^ a(r)dr = lim g( e )e^ a(r)dr e I 'o a{r)dr < e^ a(r)dr lim g(e)(-) fc “J=0 


6 — >0 


6->0 


e — >0 


Thus, letting € — > 0 in (3.21) gives 


g(t ) = f b(s)e^ s a d) dr ds, 0 < t < 1 

7o 


(3.22) 


Using partial fractions, one finds that 


(k — 1) — (k — 2 )r k — 1 1 


r(l - r) 


1 — r 


We also have 


.k — 1 


-(1 + r + --- + r <r 2 ) 


(1 — r) 1 — r 


Thus, we can rewrite a(r) from (3.20) as 


< 2 (r) = - — - + — ^ 2(1 + r H 2 ). 

r 1 — r 


We then obtain 


/ 


/c — 1 


V 


(2 


(r) dr = (k- 1) log t - 31og(l - 0 - 2 V — , 

7=1 J 


and thus 


e Ha(r)dr = ^-l(j _ ; )-3 g 


-3->-2^ 1 ,^)( 5 l^ (1 _ 5) 3 e 2EP,f ) 


' Jfc -1 j 7 


(3.23) 


Substituting this in (3.22) and recalling the definition of b from (3.20), we obtain 



lA:— 1 


(i-0 ; 


^x^k — 1 tJ 

e 2 ^J= l j 


f 


ke 


9 \^k — 1 s 7 

2 ^7 = 1 T 


ds. 


(3.24) 


We see that g has a third-order singularity at t — 1. We proceed to “unzip” this 
singularity to reveal the answer to our problem. 

We have the following proposition which connects the limiting behavior of H n 
with that of S n . 
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Proposition 3.1. 


if and only if 


H 


(k) 


n 


lim 


n—>oo n 


= i 


S 


(k) 


n 


lim 

n—^oo ft' 


l 


Proof The proof is immediate from (3.11). 


□ 


And we have the following proposition which connects the limiting behavior of 
S n with the singularity in g at t = 1 . 

Proposition 3.2. If 


S 


( k ) 


n 


lim 


n—^oo n 


= L, 


then 


lim(l — t) 3 g(t) = 2 L 

t-> l 


Proof Since lim,,-^ p = L, we also have lim, 


77 y OO n ( n — 


77 ( 77 — 1) 


= L. Let e > 0. 


Choose no such that | — L\ < e, for n > no. Then recalling (3.15), we have 


77Q 


OO 


77Q 


OO 


77 


S n t" + (L-e) ^ n(n — \)t n < g(t) < ^ S n t" + (L+e) ^ n(n- l)f 

77—0 77=770 + 1 77 = 0 77=770 + 1 

(3.25) 

Now 


OO OO 

E+« - +" = = t2 (— t )" = 

77=0 77=0 


2r 


( 1-0 


3 ’ 


SO 


oo 


E »(» - 1)7” = — 2— - - E»(» - !)7". 

77=770+ 1 ^ ^ 


2 77 o 


77 = 0 


Substituting this latter equality in (3.25), multiplying by (1 — t) 3 , and letting t -+ 1, 
we obtain 
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2L — 26 < liminf(l — t) 3 g(t ) < limsup(l — t) 3 g(t ) < 2L + 2e. 

As € > 0 is arbitrary, the proposition follows. □ 

In order to exploit Propositions 3.1 and 3.2, we will establish the existence of the 
limit linwoo h . 

Proposition 3.3. lim^oo exists. 

Proof. Rewriting the recursion equation for S n in (3.13) so that only S n appears on 
the left hand side, then dividing both sides by n 2 and subtracting ^_~| 2 from both 
sides, we have 


' n 


S n - 1 


n 2 (n — l) 2 n 2 
k 2n — 1 


^ ^77 — 1 S n — 1 


2S„_ 


77 — k 


n 2 n 2 (n — l) 2 
k 2/i — 1 


S 77-1 + 


ft 2 (/? — l) 2 ft 2 (ft — k + 1) 
2S/7-£ 


^ 77-1 + 


ft 2 (ft — k + 1) 

2S„_i 2 


ft 2 ft 2 (ft — l) 2 ft 2 (ft — + 1) ft 2 (ft — + 1) 

2 


(^77-^ + l+ l“^77-l) — 


k (2k — 5)ft + 3 — k 

H ; ““ TT*S/7-l ~ 


ft 2 ft 2 (ft — l) 2 (ft — fc + 1) 


ft 2 (ft — k + l) 


(H n -k+\ + • • • + H n -\) 


(3.26) 


As already noted, from the definitions, we have Hj < l and 5/ < ^/(/ + 1). Thus, 
there exists a C > 0 such that 


(2k — 5)ft + 3 — k 
n 2 (n — l) 2 (ft — k + 1) 

2 


S«-l < 


c 


ft' 


and 


ft 2 (ft —/: + !) 


(^ 77 -^+ 1 + • • • + Hfi — 1 ) 5 


C 


ft 


2 • 


(3.27) 


This shows that the right hand side of (3.26) is 0(^) and thus so is the left hand 
side. Consequently, the telescopic series YlT=2 (ff — (n'-i] 2 ) conver g en t- Since 



ft 


2 



Sj-i x 

U -1) 2; 


we conclude that lim^oo exists. □ 

By Propositions 3.1 and 3.3, l lim^oo exists. Then by Propositions 3.1 
and 3.2 (with L — |), it follows that 


lim(l — t) 3 g(t) — i. 
t-> 1 
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However, from the explicit formula for g in (3.24), we have 


lim(l — = ke 2 ^ _1 J 



ry 1 $7 

e ^j = 1 7 



Thus, t — pk, completing the proof of (3.3). 

We now turn to the proof of (3.4). We derive a formula by the method used to 
obtain (3.8). Recall the discussion preceding (3.8). Note that conditioned on Bj 
being chosen on the first step, the final state of the j — l molecules to the left of Bj 
and the final state of the n + 1 — j — k molecules to the right of Bj are independent 
of one another. Let Mj- \-k-,\ and M n +\-j -k-fai be independent random variables 
distributed according to the distributions of Mj- \-k and M n +\-j-k^, respectively. 
Then similar to (3.8), we have 


E(M 3 . k 
k 2 + L j 


Bj selected first) = E(k + Mj- i- k -i + M n +i-j - k;k;2 ) 2 = 

-l + L n +\-j-k + 2kHj - 1 + 2kH n +\-j-k + 2Hj-\H n +\-j-k, (3.28) 


where the last term comes from the fact that the independence gives 


EMj _ \\k\\M n -\-\— j — k-,k;2 — EMj-i ; k;iEM n + i-j-k;k;2' 
Thus, similar to the passage from (3.8) to (3.9), we have 


n—k 


L n — k 2 + 


n — k + 1 

7=0 


V L, + 


4 k 


n—k 


n—k 


n — k + 


t2>; + 


7=0 


n — k + 


^ Hj Hn-k-j » 


7=0 


for k > n 


(3.29) 


We simplify the above recursion relation by defining 

77 

R n — Lj . 

7=0 


Of course, we have L n — 0, for n = 0 , . . . ,k — 1, and thus, 

R n = 0, for n — 0 , . . . ,k — 1 . 
Recalling (3.10), we can now rewrite (3.29) in the form 


Rn — Rn — 1 +^“ + 


n — k + 1 


Rn—k T 


4k 


n — k + 1 


S n—k 


n — k + 


n—k 

T E R j R n-k-j 
7=0 


(3.30) 


, n >k. 


(3.31) 
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Proposition 3.4. 


lim 

77-^00 n 


1 77 —k 

iE", ( 


(k) 

n n-k-j 


j= 0 


6 


Proof. Let € > 0. Since lim^oo -f- = pk, we can find an such that (pk — c)n < 
H n < (/7jt + e)«, for ft > ft € . Thus 

(. Pk~e ) 2 E ./(« -A'-./) + E HjHn-ic-j < 

n e < j <n—n e —k 0<j<n € ,n—n e —k<j <n—k 

n—k 

E h j ± 

j= o 

(A + <0 2 E ./(« -/:-./)+ E H l H n-k-j • 

77 e < j <n—n e —k 0<j <n € ,n—n e —k<j <n—k 

(3.32) 

Since Hj < 7> for all j , we have 

E HjH n - k -j < 2(« e + l)n € n. (3.33) 

()< / <n e ,n—n € —k<j <n—k 


(There are 2(ft e + 1) summands on the left hand side of (3.33), and each summand, 
HjH n -k~j , is less than or equal to ft 6 ft.) Using the identity YTj=\ J 2 = + 

l)(2ft + 1), we have 


E j(n-k - j) = (n -k) ^ 7- 

l<j<n—n e —k \<j<n—n e —k 

— (ft — k)(n — n € — k — \)(n — n € — k)— 

-(n—n € —k—l)(n—n € —k)(2(n—n € —k — 1) + l) = 
6 


E ; 2 = 

1 <j<n—n € —k 


1 O o 

-n + o(n ), as n 
6 


Of course, 


E ~ k ~ j)<n y] J < + !)• 


■<./<« 6 

From (3.32)-(3.35), we conclude that 

n—k 


i </<«e 


- oo. 

(3.34) 


(3.35) 


1 1 n—k j n—k ^ 

-(Pk~0 2 < lim inf — V Hj H n - k -j < lim sup — V HjH n - k -j < -(/>* +e) 2 , 
o 77 >-oo n J o 


7 — 0 7=0 

which completes the proof, since € > 0 is arbitrary. 


□ 
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We can rewrite (3.31) as 



n — k + 3 
n — k + 1 


R n —\ k 2 — 


n — k + 1 


(Ln-k + l + ‘ ‘ * + £/7-l) + 


4 k 

n — k + 1 


Sn-k + 


2 

n — k + 1 


n—k 

Hn-k-j * 

j= 0 


Since Lj < j 2 and Sy < | j (j + 1), we conclude from Proposition 3.4 that i? /7 
satisfies an equation of the form 



n — k + 3 
n — k + 1 


+ ff^ , where satisfies 


lim 


W, 


n 


n- 


•oo /?- 



(3.36) 


In Exercise 3.1 the reader is asked to show that if for some no, the positive sequence 
satisfies R„ < ^E^R n -i + cn 2 (R n > ^E^R n - 1 + cn 2 ), then 

limsup^^ < c (liminf^^oo > c). Using this with (3.36), we conclude 
that 


Writing (3.31) in the form 


R 

lim — 

77 — ^OO n 


77 

3 



(3.37) 



k 2 -\- 


n —k + 1 


-^77 — k + 


4/: 


ft — k + 1 


Sn—k + 


n — k + 1 


^ ' Hj H n -k-j , n > k, 
j= 0 

(3.38) 


dividing both sides of this equation by n 2 , and using (3.37), Proposition 3.4, and the 
fact that S n is on the order n 2 , we conclude that 


lim 


L 


77 


77- 


•oo ft 



This gives (3.4) and completes the proof of Theorem 3.1. □ 

/V 

Exercise 3.1. Show that if for some no, the positive sequence {R n }^ nQ satisfies 
R„ < j[E^Rn-\ + cn 2 ( R n > + cn 2 ), then limsup,,^^ ff- < c 

(liminf^oo > c). 

Exercise 3.2. Any molecule that remains unbonded at the end of the nearest neigh- 
bor k - tuple bonding process occurs in a maximal row of j unbonded molecules, 
for some j £ [k — 1]. In the limit as n -> oo, what fraction of molecules ends up 
in a maximal row of j unbounded molecules? Let’s denote these fractions by q^-j, 

j €[k - 1]. Of course <lk ; j = 1 ~ Pk- 
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(a) Let k > 3 and fix j e [k — 1]. Consider the following bonding process: 
implement the bonding of nearest neighbor k -tuples as described in the chapter. 
When this process terminates, bond all the unbonded molecules that occur in 
a maximal row of j unbonded molecules, but leave untouched all unbonded 
molecules that occur in a maximal row of i unbonded molecules, for some 
i ^ j . Let M n] k,j denote the number of bonded molecules at the end of 

the process, and let H^ k,J ^ — EM n; kj. Let Convince 

yourself that satisfies the recursion equation (3.9) and that it 

satisfies the boundary condition (3.7) with one change, namely Hj k,J ^ — j , 


instead of Hj k,J ) = 0. Thus, {Sn k '^}%L 0 satisfies the recursion equation (3.13), 
and in place of the boundary condition (3.12), it satisfies the boundary condition 

sj, k ’ j) = 0 , n = 0 , . . . , j - 1 ; si k ’ j) — j ,n — j, . . . ,k - 1 . 

(b) Let gj (t) = YC^=o denote the generating function for {Sj, k ’^}^L 0 . 

Show that gj solves the differential equation g' j (t) — a(t)gj (t) + bj (t ), where 
a is as in (3.20) and 


bj(t) = b{t ) + 


-j(k - 1 - j)C l +j(k-j)t j 

( 1 - 0 3 



k — 1 


9 


with b as in (3.20). 

(c) In particular, note that bk - 1 = b\ therefore, gk - 1 satisfies the same differential 
equation satisfied by g. Thus, (3.21) holds for gk- 1 ; that is, 


gk-i(t) = gk-i(e)e^ a(r)dr + j b(s)e & a(r)dr ds, t e (e, 1). 

Use the fact that gk- i(e) = (k — l)e k ~ l + 0(e k ), as e -> 0, along with (3.23) 
to show that 


lim g k - 1 (e)e^ a(r)dr = (k - 1) 


t 


k — 1 


k — 1 


e — >0 


0 - 0 : 




(d) Use (c) to show that 


?*;*-! = (k - l)e- 2(l + L i + - + ^\ 

In particular then, q ^2 — 2e~ 3 ze 0.0996 ^ 0.100, and consequently q 34 = 
1 — P 3 ~ # 3,2 ^ 1 — 0.8237 — 0.0996 % 0.077. 

(e) It is well known that lim^oo ( 1 \ ~ l°g n ) exists; the limit is called Euler’s 

constant and is denoted by y. One has y % 0.5772. For a proof, see, for 
example, [25]. Show that 


- 2y , as k 


tfk;k — 1 


k — 1 


00 . 
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(f) For j G [k — 2], one obtains 

/ t t 

bj(s)e^ s a b) dr t e (€, 1). (3.39) 

Show that since gy (e) = je J + 0(e 7+1 ), as 6 — >► 0, one has 

lim gy (6)e^ a b) dr — 00> 

e^O 

On the other hand, since bj appears instead of bk-i — b , show that one also has 

lim [ bj{s)e^ a ^ dr ds — — oo. 

h 

You are invited to show that the appropriate terms in gj (e)e^ a b) dr an( i 

/J bj (s)e^ a b) dr ds cancel each other out and to obtain a finite limiting 
expression as e 0 on the right hand side of (3.39). This limiting expression 
is then also gj (t). One then has lim^i (1 — t) 3 gj(t) — pk + qk-j, which gives 
an explicit formula for qk-j . The above analysis gets more involved the smaller 
j is. Try it first for j — k — 2. 


Chapter Notes 

The calculation of (3.1) in the case k — 2 goes back to an article by the Nobel Prize 
winning chemist Flory in 1939 [21]. The problem was rediscovered by Page, who 
obtained the asymptotic behavior for the mean and variance in the case k — 2 [28]. 
The method used there does not generalize to k > 2. Theorem 3.1 seems to be new. 
A continuous space version of this problem was considered by Renyi [31]. 


Chapter 4 

The Arcsine Laws for the One-Dimensional 
Simple Symmetric Random Walk 


The simple, symmetric random walk {S n }%L 0 on Z starts at step n — 0 at 0 e Z and 
at each successive step jumps one unit to the right or left, each with probability 
The random walk is called “simple” because the sizes of its jumps are restricted to 
the set {1,-1}. One way to realize this random walk is as follows. Let {X n } < ff l 
be an infinite sequence of independent, identically distributed random variables 
distributed according to the Bernoulli distribution with parameter | ; that is, P (Xj = 

1) = P(X j = — 1) = | . Now define So = 0 and S n = YTj=\ Xj » n — 1 • 

We begin with a fundamental fact about the simple, symmetric random walk 
on Z. 

Proposition 4.1. 


P(limsupS /z = oo and liminfS /7 = — oo) = 1 . (4.1) 

/?— >oo n—^oo 

Remark 1. A moment’s thought shows that (4.1) is equivalent to the statement that 
the random walk is recurrent ; that is, with probability one, visits every site 

in Z infinitely often. 

Remark 2. One can consider a simple, symmetric random walk {S n } ; ^Z 0 on Z d , the 
d -dimensional lattice — at each step it jumps in one of the 2d directions with prob- 
ability X Again, the random walk is called recurrent if with probability one every 
site is visited infinitely often. It is called transient if /’(lim^oo \S n \ — oo) = 1. In 
1923, G. Polya proved the quite surprising result that this random walk is recurrent 
in two dimensions but transient in three or more dimensions. For a proof of this, see, 
for example, [15]. 

Proof. By Remark 1 above, to prove the proposition, it suffices to prove that with 
probability one, the random walk visits every site in Z infinitely often. Let p denote 
the probability that the random walk {S n }^f 0 ever returns to its starting point 0. 
We will show that p — 1 . Let Nq denote the number of times the random walk is 
at 0 after time n — 0. Then of course, P(Nq = 0) = 1 — p. Now let’s calculate 
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P(No = 1). In order to have Ao = 1, the random walk must return to 0 and then 
never return to 0 again. The probability of returning to 0 is p. If the random walk 
returns to 0, it continues independently of everything that has already transpired. 
Thus, conditioned on returning to 0, the probability that the random walk does not 
return to 0 again is 1 — p. So P(Ao — 1) = p( 1 — p). Continuing with this line of 
reasoning, we obtain 


P(N 0 = n) = p n { 1 -p), n = 0, 1, ... . 

If p — 1, it follows from the above reasoning that P(Ao — oo) = 1; that 
is, with probability one, the random walk visits 0 infinitely often. If p e (0, 1), 
then the above calculation shows that Ao is distributed according to the geometric 
distribution with parameter p. For p e (0, 1), the expected value ENo of Ao is 
given by 


oo oo oo 

ENo — ^^ n P(No — n) — 1 — P) — p(\ ~ P) ^2, n P n ~ { ~ 

77=0 n= 0 n= 0 

rt 00 1 

P0- -P)^~PEP n ) = Y = -p—- (4-2) 

dp ^ 1 — p 1 — p 

(The term by term differentiation above is permitted because for any po < l, the 
series is uniformly absolutely convergent over p e [0, po\.) Of course, if p — 1, 
then ENo — oo. Thus, the formula for ENo in (4.2) also holds if p = 1. 

We now calculate ENo in a different way. Let l{s„=o} denote the indicator 
random variable that is equal to 1 if S n = 0 and is equal to 0 otherwise. Then 
No , the number of times the random walk returns to 0, can be represented as 

oo 

N 0 = E 1 {£.=<>}• 

77 = 1 


By the linearity of the expectation and the nonnegativity of the summands, we 
conclude that 


oo 

ENo - E P ( S » = °)> 

77 = 1 


(4.3) 


since El {Sn =o} = 0 • P(S n ^ 0) + 1 ■ P(S n = 0) = P(S n = 0). 

Since the random walk starts at 0, it can only return to 0 at even times; thus, 
P(S 2 n+\ = 0) = 0. Since the random walk has two equally likely choices at each 
step, there are 2 lu equally likely paths that the random walk can traverse during its 
first In steps. Now one has = 0 if and only if from among the first In jumps, n 

,r\ , 

of them were to the right and n of them were to the left. There are ( ”) such paths; 
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thus, 


P(S 2n = 0) = 


£) 

2 2 « 


(4.4) 


Using Stirling’s formula, namely, n! ~ n"e " Clmi as n — »■ oo, we have 


( 2 ”) (2«)! (2n) 2n e~ 2n V4 


nn 


2 2 


77 


(n\) 2 2 2n 


n 2n e 2n (2nn)2 2n 


+Jjin ’ 


as « 


oo. 


(4.5) 


Since -^= = oo, it follows from (4.3)-(4.5) that TT/Vo = oo. In light of (4.2), 
we conclude that p — 1 . 

We have shown that with probability one, the random walk returns to 0. 
Upon returning to 0, the random walk continues independently of everything that 
transpired previously; thus, in fact, with probability one, the random walk visits 0 
infinitely often. From this, it is easy to show that in fact with probability one the 
random walk visits every site infinitely often. We leave this as Exercise 4.1. □ 


Define 


To = inf {n > 0 : S n = 0}. 

The random time To is called the first return time to 0. By Proposition 4.1, it follows 
that P(Tq < oo) = 1. However, perhaps surprisingly, one has ETo = oo; the reader 
is guided through a proof of this in Exercise 4.2. This result suggests that there is 
quite some tendency for the random walk to take a long time to return to 0. In this 
chapter we present two results which give vivid expression to this phenomenon. 

The arcsine distribution will figure prominently in the results of this chapter. The 
distribution function for this distribution is defined by 

2 r - 

EarcsinW = ~ arcsfil y/X, 0 < X < 1. 

71 

The corresponding density function /arcsinOO = E^rcsinW is given by 

yarcsin(^) = , == ? 0 < X < 1. 

K yj X(\ ~ X) 

Our first theorem concerns the random time 

Lq 2/2) = max{/: < 2n : Sk — 0}, 

which is the last return time to 0 up to step 2 n. By parity considerations, L^ u) can 
take on only even values. 
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Theorem 4.1. 


P(L ( Q n) = 2k) = 


/2 k\ (2n—2k\ 

UA n-k ) 

2 ln 


k = {0, 1, . . . , n}. 


(4.6) 


Furthermore, 


,L< 2b) x 2 _ 

lim P (— - — < .v) = - arcsin VW 0 < x < 1. 

n^oo 2 n Tt 


(4.7) 


Remark. This theorem highlights the tendency of the random walk to take a long 
time to return to 0. Indeed, since the density / ar csinM blows up at x = 0, 1, it 
follows from (4.7) that for large n the most likely epochs k for the last visit to 0 up 
to time 2 n are those satisfying k — o(n) or k — 2n — o(n), that is, those epochs 

at the very beginning or at the very end of the trajectory. Since ^ arcsin 

from (4.7) it also follows that for large n , there is a probability of about \ that a 
random walk trajectory of 2 n steps will never return to 0 during the second half of 
its life. 

Our second theorem concerns the random variable 0 + n , which should be thought 
of as the number of steps k e [2 n] at which the random walk is positive 
(or nonnegative). Of course, the number of steps between 1 and 2 n that the random 
walk is positive is usually not equal to the number of steps that it is nonnegative. 
In order to obtain an exact result in closed form for all n , we need to work in a 
symmetric setting. Therefore, if the random walk is equal to 0 at some step 2 k, we 
classify that step as “positive” if the previous step was positive and “negative” if the 
previous step was negative. That is, 

° 2 n = \i k G l 2n ] '■ s k > 0 or Sk = 0 and S k - 1 > 0}|. 


We call the occupation time of the positive half line up to time 2 n . Then 
gives the fraction of steps among the first 2 n steps that the random walk is in the 
positive half line. Note that by parity considerations, 0+ n can only take on even 
values. 

Theorem 4.2. 


o 


+ 

In 


2n 


Furthermore, 


P(0+ = 2k) 


(2k\ /2n — 2k\ 
U/\ n-k ) 

2 ln 


, k — {0, 1, . . . , n}. 


(4.8) 


ok 2 _ 

lim P ( — — < x) — — arcsin Cx, 0 < x < 1. 
«->oo 2 n 7t 


(4.9) 
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Fig. 4.1 A random walk path of length 17 


Remark 1. Since the density /arcsinC* ) takes on its minimum at x = and since 
it blows up at x = 0, 1, it follows that for large n the most likely percentages 
of time that a random walk trajectory is nonnegative are around 0% and 100%, 
while the least likely percentage is around 50 % ! To put it in a different way, if two 
players bet a dollar each on a succession of fair coin flips, then after a long time it 
is overwhelmingly more likely that one of the players was leading almost the whole 
time than that each player was leading about half the time. This result even more 
vividly highlights the tendency of the random walk to take a long time to return to 0. 

Remark 2. Let 0\ n = {k e [n\ : S 2 k = 0} denote the number of visits to 0 of the 

o 0 

random walk up to step [2 n). It is not hard to show that the random variable 
denoting the fraction of steps up to In at which the random walk is at 0, converges 
to 0 in probability; that is, 

O 0 

lim P(—2H- > e) = 0, for all € > 0. (4.10) 

«->oo 2 n 

We leave this as Exercise 4.3. In light of this, it follows that (4.9) would also hold if 
we had defined 0+ n in an asymmetric fashion as the number of steps up to [2 n\ for 
which the random walk is nonnegative: \{k e [In] : Sk > 0}|. 

Our approach to proving the above two theorems will be completely combi- 
natorial rather than probabilistic. Generating functions will play a seminal role. 
A random walk path of length m is a path {x 7 }J =0 which satisfies 


x 0 = 0; 

x j x y — i — 3=1, j g M; 


(4.11) 


See Fig. 4.1. Since a random walk path has two choices at each step, there are 2 m 
random walk paths of length m. The probability that the simple, symmetric random 
walk behaves in a certain way up until time m is simply the number of random walk 
paths that behave in that certain way divided by 2 m . 
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Fig. 4.2 A Dyck path of length 16 


Our basic combinatorial object upon which our results will be developed is the 
Dyck path. A Dyck path of length 2 n is a nonnegative random walk path {xj }^ 0 of 
length 2 n which returns to 0 at step 2 n \ that is, in addition to satisfying (4.11) with 
m — 2n, it also satisfies the following conditions: 


xj > 0, j e [2 n\; 

*2n — • 


(4.12) 


See Fig. 4.2. We use generating functions to determine the number of Dyck paths. 
Let d n denote the number of Dyck paths of length 2 n . We also define do = 1 . 

Proposition 4.2. The number of Dyck paths of length 2 n is given by 


, 1 (ln\ 

d n = — T , n > 1. 

n + 1 \ n J 

Remark. The number C n — j- T ( 2n ) is known as the nth Catalan number. 

Proof. We derive a recursion formula for {d n } ( ff 0 . A primitive Dyck path of length 
2k is a Dyck path {xj }^L 0 of length 2k which satisfies Xj > 0 for j — 1 , . . . , 2k— 1. 
Let vr denote the number of primitive Dyck paths of length 2k. Every Dyck path 
of length 2 n returns to 0 for the first time at 2k, for some k e [n\. Consider a Dyck 
path of length 2 n that returns to 0 for the first time at 2k. The part of the path from 
time 0 to time 2k is a primitive Dyck path of length 2k, and the part of the path from 
time 2k to 2 n is an arbitrary Dyck path of length 2 n — 2k. (In Fig. 4.2, the Dyck 
path of length 16 is composed of an initial primitive Dyck path of length 6, followed 
by a Dyck path of length 10.) This reasoning yields the recurrence relation 

n 

d n — ^ ^ Vkd n —ki n 1. (4.13) 

k = 1 


Now we claim that 


Vk = d k - 1, k > 1. 


(4.14) 
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Indeed, a primitive Dyck path {x/} 2/ L 0 must satisfy x\ — 1 ,Xj > 1 , for j e [2k— 2], 
%2it-i — 1> = 0. Thus, letting = xy+i — 1, 0 < j < 2k — 2, it follows 

that {y 7 - } 2 ^ 2 is a Dyck path. Of course, this analysis can be reversed. This shows 
that there is a 1-1 correspondence between primitive Dyck paths of length 2k and 
arbitrary Dyck paths of length 2 (k — 1), proving (4.14). From (4.13) and (4.14) we 
obtain the Dyck path recursion formula 

n 

d n — ^ ' dk—i d n —k. (4.15) 

k = 1 

Let 

oo 

D{x) — y j d n x n (4.16) 

77 = 0 


be the generating function for {d n }^L 0 . Since there are 2 2/? random walk paths of 
length 2 n, we have the trivial estimate d n < 2 2 " = 4 /? . Thus, the power series 
defining D(x) is absolutely convergent for \x\ < The product of two absolutely 
convergent power series a n x11 an d bn* 11 is c n xU » where c n — 

o a jb n -j • Thus, if in (4.15), the term dk~\ were dk instead, and the summation 

started from k — 0 instead of from k — 1, then we would have had D 2 (x) — D(x). 
As it is, we “correct” for these deficiencies by multiplying by x and adding 1 : it is 
easy to check that (4.16) and (4.15) give 

D{x) = xD 2 (x) + 1. (4.17) 


Solving this quadratic equation in D gives D(x) — 4x 


2x 


Since we know 


from (4.16) that D(0) = 1, we conclude that the generating function for {d n is 
given by 


D(x) = 


1 - Vl -4x 
2x 


x 


< 


(4.18) 


Now (1 - 4x)5 1^=0 = 1, ((1 - 4x)5) , U =0 = -2 and 


77 — 1 


i,d = --r n<2/ - 1) = - y( 2 r 2) ' 

-» vv 77 1 AW 7 n\2 n ~\n - 1)! 


7 = 1 


2n - 1 


2/i — 1 \ n 


, for ft > 2; 
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thus, the Taylor series for Vl — 4x is given by 


Vl — 4x 


oo 


~ 2x ~J2 


n= 2 


In - 1 




(4.19) 


The coefficient of x 77+1 in (4.19) is — ypf iCn+i) ~ ~ tOV 4M n g this along 
with (4.18) and (4.19), we conclude that 


D(x) = ^ 

n = 0 


1 

n + 1 



x 


1 

< - 
4 


(4.20) 


From (4.20) and (4.16) it follows that d n — ^j-j- ( 2 ;? ;z ). □ 

The proof of the proposition gives us the following corollary. 

Corollary 4.1. The generating function for the sequence {d n }fL 0 , which counts 
Dyck paths, is given by 


D(x) = 


1 - Vl -4x 


2x 


9 




Let w n denote the number of nonnegative random walk paths of length In . The 
difference between such a path and a Dyck path is that for such a path there is no 
requirement that it return to 0 at time In . We also define wo = 1 . We now calculate 
{w n } < ff 0 by deriving a recursion formula which involves {d n }ff 0 . 

Proposition 4.3. The number w n of nonnegative random walk paths of length 2 n is 
given by 


W n 



(4.21) 


Remark. The number of random walk paths of length In that return to 0 at time 2 n 
is also given by ( ”), since to obtain such a path, we must choose n jumps of +1 and 
n jumps of —1. Thus, we have the following somewhat surprising corollary. 

Corollary 4.2. 


P(S\ > 0, . . . , S 2n > 0) — P(S 2n — 0). 

Proof of Proposition 4.3. Of course every nonnegative random walk path of length 
In + 2, when restricted to its first In steps, constitutes a nonnegative random walk 
path of length In . A nonnegative random walk path of length In which does not 
return to 0 at time In , that is, which is not a Dyck path, can be extended in four 
different ways to create a nonnegative random walk path of length In + 2. On the 
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other hand, a nonnegative random walk path of length 2n which is a Dyck path can 
only be extended in two different ways to create a nonnegative random walk path of 
length In + 2. Thus, we have the relation 

w, 7+ i = 4 (w n - d n ) + 2 d n = 4w, 7 - 2d ;7 , n > 0. (4.22) 


Let 


oo 

kk(x) = w /7 x /? 

77 —0 


be the generating function for {w n }^ 0 . As with the power series defining D(x), 
it is clear that the power series defining W(x) converges for \x\ < \. Multiply 
equation (4.22) by x n and sum over n from 0 to oo. On the left side we obtain 
w n+\x n — 1), and on the right hand side we obtain 4W(x)— 2D(x). 

From the resulting equation, ~{W{x) — 1) = 4 W(x) — 2 D(x), we obtain 

DC 

1 — 2 xD(x) 

W(x) = J . (4.23) 

Substituting for D(x ) in (4.23) from Corollary 4.1, we obtain 


W(x) = 


1 

's / 1 — 4x 




We have 1F(0) = 1, and for n > 1, 


W (n) ( 0) 


n 


1 

n } 


77 


= -(1 -4xrb {n) \x=0 = -2" f\(2j - 1 ) = -2 n( ^\ 

n\ 1 1 n\ 2 11 n\ 


7=1 





Thus the Taylor series for IF(x) is given by 


oo 

W(x) = J2 

77—0 




9 


/O \ 

and we conclude that w, 7 = (“”). □ 

Armed with Propositions 4.2 and 4.3, we can give a quick proof of (4.6). 

Proof of Theorem 4.1. By the remark after Proposition 4.3, it follows that (4.6) 
holds for k — n. So we now assume that k £ {0, 1, . . . , n — 1}. Given a random 
walk path, {xj} l J=0 , we define the negative of the path to be the path {—Xj} l j =0 . 

If a random walk path of length 2 n satisfies L^ n) — 2k, then its first 2k steps 
constitute a random walk path that returns to 0 at time 2k, and its last 2 n — 2k 
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steps constitute either a random walk path that is strictly positive or the negative 
of such a path. As noted in the remark after Proposition 4.3, there are ( k ) random 
walk paths of length 2k that return to 0 at time 2k. How many strictly positive 
random walk paths of length 2 n — 2k are there? Let {xj } 2 j n =Q k be such a path. Then 
X\ = 1, and by parity considerations, X 2 n -ik > 2. Consider now the part of the 
path from time 1 to time 2 n —2k. If we relabel and subtract one, y j — Xj+\ — 1, 
j = 0, 1 . . . , 2n — 2k — 1, then we obtain a nonnegative random walk path of length 
2ft — 2k — 1. By defining yin-ik — yin-ik-i ± 1, we can extend this path in two 
ways to get a nonnegative random walk path of length 2n — 2k. This reasoning 
shows that there is a two-to-one correspondence between nonnegative random walk 
paths of length 2 n — 2k and strictly positive random walk paths of length 2 n —2k. 
We know that there are w n - k = ( "Z k ) nonnegative random walk paths of length 
2 n —2k; thus, we conclude that the number of strictly positive random walk paths 
of length 2 n — 2k is equal to n ~_ k j. We conclude from the above analysis that 

the number of random walk paths of length 2 n that satisfy L^ u) — 2k is equal to 
( 2 /f)( 2 "-?)’ f rom which (4.6) follows. 

We now consider (4.7). In Exercise 4.4 the reader is asked to apply Stirling’s 
formula and show that for any e > 0, 


0 ( 2 :- k k ) i i 


2 2 


n 


ft y/ k(n — k) 


, uniformly over en < k < (l — €)n, as n 


► oo. 
(4.24) 


Using (4.24) and (4.6), we have for 0 < a < b < 1 


P(a < 


L 


( 2 / 7 ) 

0 


[nb] /2 k\ (In— 2k 


2 n 


<b)= E 


(2k\ (2n—2k\ 
\k)\ n-k ) 


[nb] 


- E 1 

^ k = [na ]+ 1 y |(1 - |) 


k = [na] + \ 


l 


2 2 


n 


y - , 1 

* 47+1 77 - *) 


-, as -> oo. 


n 


(4.25) 


But the last term on the right hand side of (4.25) is a Riemann sum for 
T dx. Thus, letting n -> oo in (4.25) gives 


, L< 2 ' ,) ,,141 ,2 2 _ 

lim P(a < — — < b) — — / = dx — — arcsm v# arcsin 

/7 - >0 ° 2ft 7T J a y/x(l-x) X X 

for 0 < a < b < 1, 


which is equivalent to (4.7). This completes the proof of Theorem 4.1. □ 

We now turn to the proof of Theorem 4.2. 
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Proof of Theorem 4.2. We need to prove (4.8). Of course, (4.9) follows from (4.8) 
just like (4.7) followed from (4.6). Recalling the symmetric definition of 0+ u , for 
the purpose of this proof, we will refer to as “positive” if either S 2 k > 0 or 
Sik — 0 and S^-i > 0. Let c n ^ denote the number of random walk paths of length 
In which are positive at exactly 2k steps. Since there are l 2u random walk paths of 
length 2 n, in order to prove (4.8), we need to prove that 


Cfi,k 



In — 2 k\ 
n-k y 



(4.26) 


By Proposition 4.3, we have c nM — ( 2 ; "), and by symmetry, c n $ — ( 2 ”); thus, (4.26) 
holds for k = 0, n. 

Consider now k G [n — 1]. A random walk path that satisfies 0+ }1 — 2k 
must return to 0 before step 2 n. Consider the first return to 0. If the path was 
positive before the first return to 0, then the first return to 0 must occur at step 2 j , for 
some j G [k] (for otherwise, the path would be positive for more than 2k steps). If 
the path was negative before the first return to 0, then the first return to 0 must occur 
at step 2 j , for some j e [n — k] (for otherwise the path would be positive for fewer 
than 2k steps). In light of these facts, and recalling that vj — dj-\ is the number of 
primitive Dyck paths of length 2j , it follows that for j G [k ] , the number of random 
walk paths of length 2 n which start out positive, return to 0 for the first time at step 
2 j , and are positive for exactly 2k steps is equal to dj-\C n -j^-j , Similarly, for 
j g [n — k], the number of random walk paths of length 2 n which start out negative, 
return to 0 for the first time at step 2 j , and are positive for exactly 2k steps is equal 
to dj-\C n -jkk • Thus, we obtain the recursion relation 

k n—k 

Cn,k — ^ y dj — 1 Cn —j,k—j T - ^ ' dj — \ C n —j\k ? k G \n 1]. (4.27) 

j = 1 j = 1 

Let e n ■■= ( 2 ;), n > 0. As follows from the remark after Proposition 4.3, for 
n > 1 , e n is the number of random walk paths of length 2 n that are equal to 0 at 
step 2 n. We derive a recursion formula for {e n }ff 0 . A random walk path of length 
2 n which is equal to 0 at step 2 n must return to 0 for the first time at step 2k, for 
some k g [n]. The number of random walk paths of length 2 n which are equal to 0 
at time 2 n and which return to 0 for the first time at step 2k is equal to 2 Vke n -k — 
2dk-\e n -k . Consequently, we obtain the recursion formula 

77 

e„ = Y2 2d k-ie n - k . (4.28) 

k = 1 

We can now prove (4.26) by considering (4.27) and (4.28) and applying induction. 

To prove (4.26) we need to show that for all n > 1, 


Cn,k — ^k^n—k •> for k — 0,1 , ,n. 


(4.29) 
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When n — 1, (4.29) clearly holds. We now assume that (4.29) holds for all n < no 
and prove that it also holds for n = no + 1. When n — no + 1 and k — 0 or 
k — no + 1, we already know that (4.29) holds. So we need to show that (4.29) 
holds for n = no + 1 and k e [« 0 ]. Using (4.27) for the first equality, using the 
inductive assumption for the second equality, and using (4.28) for the third equality, 
we have 


Cno + l,k — ^ \ dj — \CyiQ+\— j,k— j + 
7=1 


«o + l~ k 

dj -\ c no +\-j ,k 

7 = 1 


k no+l—k 

y 'j d j — \@k—j ^np + l—k “ 1 “ y ' d j — l^k^no + l—k—j — 
7=1 7=1 

1 1 

~^k^no-\-\—k T - -Cno + l-kCk — ^k^no + l— ki 


(4.30) 


which proves that (4.29) holds for n = no + 1 and completes the proof of 
Theorem 4.2. □ 

Exercise 4.1. This exercise completes the proof of Proposition 4.1. We proved that 
with probability one, the simple, symmetric random walk on Z visits 0 infinitely 
often. 

(a) For fixed x e Z, use the fact that with probability one the random walk visits 
0 infinitely often to show that with probability one the random walk visits x 
infinitely often. (Hint: Every time the process returns to 0, it has probability 
(^)hl of moving directly to x in |x| steps.) 

(b) Show that with probability one the random walk visits every x e Z infinitely 
often. 

Exercise 4.2. In this exercise, you will prove that ETo = oo, where To is the first 
return time to 0. We can consider the random walk starting from any j e Z, rather 
than just from 0. When we start the random walk from j , denote the corresponding 
probabilities and expectations by Pj and Ej . Fix n > 1 and consider starting the 
random walk from some j e {0, 1, ... ,n}. Let To, n denote the first nonnegative 
time that the random walk is at 0 or n . 

(a) Define g(j) — Ej Tq^. By analyzing what happens on the first step, show that 
g solves the difference equation g(j) = 1 + \g(j + 1) + jSU ~ 1)’ f° r 
j — 1, — 1. Note that one has the boundary conditions g(0) = g(n) — 0. 

(b) Use (a) to show that Ej Tq^ = j(n — j). (Hint: Write the difference equation 
in the form g(J + 1) - g(j) = g(j ) - g(j - 1) - 2.) 

(c) In particular, (b) gives E\To t n — n — 1. From this, conclude that ETo — oo. 

Exercise 4.3. Prove (4.10): lim„^ 00 / ) (^ L > e) = 0, for all e > 0. (Hint: 
Represent 0\ n by 0\ n = ^{Sj=ob where l{Sj=o) is as in the proof of 
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0 ° 

Proposition 4.1. From this representation, show that lim^oo E — 0. Conclude 
from this that (4.10) holds.) 

Exercise 4.4. Use Stirling’s formula to prove (4.24). That is, show that for any 
e, 8 > 0, there exists an n € $ such that if n > n € j, then 


1-8 < 


2k\ /2n—2k\ 

k pi'* W fc(«-fc)< 1+3, 


for all k satisfying en < k < (1 — e)n. 


Exercise 4.5. If one considers a simple, symmetric random walk {Sk}% L 0 up to 
time 2n , the probability of seeing any particular one of the 2 lu random walk paths 
of length In is equal to 2~ ln . Recall from the remark after Proposition 4.3 that there 

,r\ <. 

are ( ") random walk paths of length 2 n that return to 0 at time 2 n. It follows from 
symmetry that conditioned on = 0, the probability of seeing any particular one 
of the ( 2/? ) random walks paths of length 2 n which return to 0 at time 2 n is equal 
to - L - 

e) • 

(a) Let p e (0, 1) — {|} and consider the simple random walk on Z which jumps 
one unit to the right with probability p and one unit to the left with probability 
1 — p. Denote the random walk by {S„ }^Z 0 - Consider this random walk up 
to time 2 n . For each particular random walk path of length 2 n , calculate the 
probability of seeing this path. The answer now depends on the path. 

(b) Conditioned on = 0, show that the probability of seeing any particular one 
of the (“ /? ) random walk paths of length 2 n which return to 0 at time 2 n is equal 
to 77 

e) • 


( d) 

Exercise 4.6. Let 0 < j < m. Consider the random walk {S n }^L 0 as in 
Exercise 4.5, with p e (0, 1), but starting from j , and denote probabilities by P, . 

Let T 0 m denote the first nonnegative time that this random walk is at 0 or at m. Use 
the method of Exercise 4.2 — analyzing what happens on the first step — to calculate 
Pj (S \ p) — 0), that is, the probability that starting from j , the random walk reaches 

k).m 

0 before it reaches m. (Hint: The calculation in the case p — \ needs to be treated 
separately.) 


Chapter Notes 

The arcsine law in Theorem 4.2 was first proven by R Levy in 1939 in the context 
of Brownian motion , which is a continuous time and continuous path version of the 
simple, symmetric random walk. The proof of Theorem 4.2 is due to K.L. Chung and 
W. Feller. One can find a proof in volume 1 of Feller’s classic text in probability [19]. 
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One can also find there a proof of Theorem 4.1. Our proofs of these theorems are 
a little different from Feller’s proofs. As expected, the proofs in Feller’s book have 
a probabilistic flavor. We have taken a more combinatorial/counting approach via 
generating functions. Proposition 4.3 and Corollary 4.2 can be derived alternatively 
via the “reflection principle”; see [19]. For a nice little book on random walks from 
the point of view of electrical networks, see Doyle and Snell [15]; for a treatise on 
random walks, see the book by Spitzer [32]. 


Chapter 5 

The Distribution of Cycles in Random 
Permutations 


In this chapter we study the limiting behavior of the total number of cycles and of 
the number of cycles of fixed length in random permutations of [n\ as n -> oo. This 
class of problems springs from a classical question in probability called the envelope 
matching problem. You have n letters and n addressed envelopes. If you randomly 
place one letter in each envelope, what is the asymptotic probability as n -> oo that 
no letter is in its correct envelope? 

Let S n denote the set of permutations of [n\. Of course, S n is a group, but the 
group structure will not be relevant for our purposes. For us, a permutation a e 
S n is simply a 1-1 map of [n] onto [n\. The notation oy will be used to denote 
the image of j e [n\ under this map. We have \S n \ = n\. Let P„ denote the 

uniform probability measure on S n . That is, Pj/ ( A ) = for any subset A C S n . 
If oy = j , then j is called a fixed point for the permutation a. Let D n C S n 
denote the set of permutations that do not fix any points; that is, o e D n if oy ^ j, 
for all j e [n\. Such permutations are called derangements. The classical envelope 
matching problem then asks for lim^oo P„ (D n ). 

The standard way to solve the envelope matching problem is by the method of 
inclusion-exclusion. Define G, = {a e S n : a, = /}. (We suppress the dependence 
of Gj on n since n is fixed in this discussion.) Then the complement D c n of D n is 
given by D c n — U " =1 G/, and the inclusion-exclusion principle states that 

n 

P(U? =1 G,-) = y>( G ‘)- E P(Gi^Gj)+ 

/ — I 1 </ < j <n 

J2 p (°> n Gj n G k ) - ■■■ + (-1 y- x P(n n i=x Gi). 

1 </ < j <k<n 

(See Exercise A.2 in Appendix A.) Each of the probabilities above can be computed 
readily. After some calculations one finds that P(D n ) = 1 — P(U" =1 G/) = 1 — 1 + 
5! - H f (-1)"^; thus, lim„^oo P(D„) = e~ l . 


R.G. Pinsky, Problems from the Discrete to the Continuous , Universitext, 

DOI 10. 1007/978-3-3 19-07965-3_5, © Springer International Publishing Switzerland 2014 
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(n) 

Here is an elegant, alternative proof using generating functions. Let d ^ denote 
the number of permutations in S n that fix exactly k points. We need to calculate 

linwoo %-• Clearly, 


^ 2d ( k n) =n\ , (5.1) 

k=0 

since every permutation fixes k points, for some k. To construct a permutation in 
S n that fixes exactly k points, first we can choose k numbers from [n\ for the fixed 
points, and then we must choose a permutation of the other n — k numbers that fixes 
none of them; thus, 




d 


(n—k) 

0 


Substituting this in (5.1) gives 


77 


E 



= n 


! 


or equivalently 




(n—k) 


k = 0 


k ! (n — k ) ! 


= 1 


(5.2) 


If one multiplies the absolutely convergent power series J2n=o a n xT1 by the abso- 
lutely convergent power series J2^=o^n xl \ one gets the absolutely convergent 
power series 'Y^L^c n x 11 , where c n — YTk=o a kbn-k- Thus, it follows from (5.2) 
that 


00 j (n) 


00 V« °° rJ {n) °° 

A , a _0_ v 77' 

A 

«! “ n\ 

77—0 77—0 77—0 


or 


00 j ( n) 

Y^x" 

^ n\ 

77—0 


X 


1 — X ’ 


X 


< 1 


(5.3) 


(L 


00 


Thus -V is the coefficient of x n in 


77! 


,—x 


2 3 

X X J 


1 — X 


— ( 1 — x + — — — T •••)(! T x + x T x + •••), 


d, 


00 


and this is easily seen to give = 1 — 1 + x x + • • • + (— \) n x 


i i_ 

2 ! 3 ! 


in J_ 
n\ 
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In order to begin our study of the behavior of the number of cycles and of the 
number of cycles of fixed length in random permutations, we recall some basic facts 
and notation concerning cycles of permutations. Consider the permutation a e S 4 
given in two-line form by ( 2413 )* This means that G\ — 2, 02 — 4, etc. Since 
1 goes to 2, 2 goes to 4, 4 goes to 3, and 3 goes back to 1, we call 0 cyclic and 
denote this by writing <7 = (12 4 3). (We could also just as well write it as (4 3 1 2), 
for example.) Recall that every permutation can be decomposed into a product of 
disjoint cycles. For example, consider <7 e S% given by ( 32586714 )* Under < 7 , 
1 goes to 3, 3 goes to 5, 5 goes to 6, 6 goes to 7, and 7 goes back to 1, closing a 
cycle. Now 2 goes to 2, which makes a cycle unto itself, and finally, 4 goes to 8 
and 8 goes back to 4. Therefore, we write <7 = (1 3 5 6 7) (2) (4 8) or, alternatively, 
<7 = (1 3 5 6 7) (4 8); in the latter form, the convention is that every number that does 
not appear at all forms a cycle unto itself. Note that <7 has one cycle of length 5, one 
cycle of length 2, and one cycle of length 1. 

For a e S n and j e [n\, let C| 77) (cr) denote the number of cycles of length j in 
a. Note that for all <7 e S n , one has the identity 

= n. 

j = 1 

We call (C 1 (/2) (a), C 9 77) (cr), . . . , C ; | 77) (a)) the cycle type of the permutation < 7 . Let 

N {n) (o) = Y J Cf ) (o) 

7 = 1 

denote the number of cycles in the permutation a e S n . Under the probability 
measure P„ 9 we may think of N^ n) and cj 77) as random variables. In this chapter 
we will investigate the limiting distribution of the random variable 7V (/2) and of 
the random variable cj 77) for fixed j 9 as n -> 00 . In fact, more generally, 
we will investigate the limiting distribution of the j -dimensional random vector 
(C[ n \ C 9 /?) , . . . , CjU). We call these cycles small cycles because their lengths are 
fixed as n 00 . 

Instead of just considering permutations under the uniform measure , we will 
consider permutations under a one-parameter family of probability measures which 
includes the uniform measure as a particular case. For each 0 e (0, 00 ), we define a 
probability measure P„ 9) on S n by 






KAO) ’ 
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where 


K n (0) = J2 0 N( " Hcr) 

(J £ Sj 7 

is the normalizing constant required to make a probability measure. Thus, under 
the measure P /7 , every permutation is weighted proportionally by the parameter 
0 raised to an exponent equal to the number of cycles in the permutation. 
Consequently, for 0 > 1 , Pn favors permutations with many cycles, and for 0 < 1 , 
it favors permutations with few cycles. Of course 0 — 1 corresponds to the uniform 
measure: — Pn l) . The original reason for considering the probability measures 

(Q\ 

Pn can be attributed to Proposition 5.1 below, which gives the exact distribution 

(Q\ 

of the cycle types under P„ . In Exercise 5.1, the reader is asked to verify that 

(A\ 

Proposition 5.1 follows from the definition of P }1 along with Proposition 5.2 and 
Lemma 5.1, which are stated and proved in the course of the proofs of Theorems 5.1 
and 5.2 below. We use the standard notation 

e {n) := 0(0 + 1) • ( 6 > + n - 1), n > 1 . 

This expression is sometimes referred to as a rising factorial, the notation is called 
the Pochhammer symbol. 

Proposition 5.1. If Y^] = \ jaj — n, then 






n\ 

0 (") 



1 


The distribution in Proposition 5.1 is known as the Ewens sampling formula’, it 
arose originally in the context of population genetics. 

We will prove a weak law of large numbers for the distribution of the number of 
cycles N^ n K 

Theorem 5.1. Let 0 e (0, oo). Under P 77 , the distribution of the number of cycles 
N^ u) in a permutation satisfies 

jSfhi) 

> 0 in probability’, 

log n 


that is, for all e > 0, 



o | >o = o. 
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We now consider the small cycles. A random variable Z is distributed according 
to the Poisson distribution with parameter A > 0 (Z ~ Pois(A)) if 

P(Z = j) = e~ x — , for j = 0, 1, . . . . 

j'- 

The j discrete random variables {Xf} J i=l are called independent if P(X\ — 

x\ , . . . , Xj = xj ) = n/=i P(Xj = Xi), for all choices of {xi} J i=l c R. In the 
sequel, Z\ will denote a random variable distributed according to Pois(A), and it 
will always be assumed that {Z\ i } J i=l are independent for distinct {A; } J i=l . 

We will prove a weak convergence result for small cycles. 

Theorem 5.2. Let 0 e (0, 00 ). Let j be a positive integer. Under the measure P„ , 
the distribution of the random vector {C^\ C^\ . . . , C -”^) converges weakly to the 

distribution of(Z$, Z 9 , . . . , Z e ). That is, 

2 i 


lim Py\c[ n) = mi,C 2 (n) 

n->oo 

mf >0, i — 1, . . . , j. 


m 2 , 


C 


(/?) 


j 


= m i ) = n 


_fi(h 


7=1 


9_\mj 
_i_ 

mf. 


(5.4) 


Remark. Let j be a positive integer and let 1 < k\ < k 2 < • • • < kj . In 
Exercise 5.7 the reader is asked to show that by making a small change in the proof 
of Theorem 5.2, one has 




7=1 


( 0_\17li 

V 

nijl 


rrij >0, i = 


(5.5) 


( 77 ) 

In particular, for any fixed y, the distribution of C - converges weakly to the 

Pois(j) distribution. Actually, (5.5) can be deduced directly from (5.4); see 
Exercises 5.2 and 5.3. 

Our proofs of these two theorems will be very combinatorial, through the method 
of generating functions. The use of purely probabilistic reasoning will be rather 
minimal. 

For the proofs of the two theorems, we will need to evaluate the normalizing 
constant K n (0). Of course, this is trivial in the case of the uniform measure, that 
is, the case 6 = 1. Let s(n,k ) denote the number of permutations in S n that have 
exactly k cycles. From the definition of K n (6 ), we have 


K„(6) = J2 s (n, k)9 k . 

k = 1 


(5.6) 
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Proposition 5.2. 


K„(0) = 0 (n) . 

Remark. The numbers s(n,k ) are called unsigned Stirling numbers of the first kind. 
Proposition 5.2 and (5.6) show that they arise as the coefficients of the polynomials 
q n (0) := 0 ("> = 0(0 + 1) • • • (0 + n - 1). 

Proof. There are (n — 1)! permutations in S n that contain only one cycle and one 
permutation in S n that contains n cycles: 

s(n , 1 ) — (n — 1)!, s(n,n) = 1. (5.7) 

We prove the following recursion relation: 

s(n T 1 , k ) — ns(n,k) T s(n , k — 1), n ^ 2, 2 ^ k 5 tz . (5.8) 

Note that (5.7) and (5.8) uniquely determine s(n,k) for all n > 1 and all k e [n]. 

To create a permutation a 7 G ^+i, we can start with a permutation a G S n 
and then take the number n + 1 and either insert it into one of the existing cycles 
of a or let it stand alone as a cycle of its own. If we insert n + 1 into one of the 
existing cycles, then o' will have k cycles if and only if o has k cycles. There are n 
possible locations in which one can place the number n + 1 and preserve the number 
of cycles. (The reader should verify this.) Thus, from each permutation in S n with 
k cycles, we can construct n permutations in S n + 1 with k cycles. If, on the other 
hand, we let n + 1 stand alone in its own cycle, then cr' will have k cycles if and 
only if o has k — 1 cycles. Thus, from each permutation in S n with k — 1 cycles, we 
can construct one permutation in S n + 1 with k cycles. Now (5.8) is the mathematical 
expression of this verbal description. 

Let c Ht k denote the coefficient of 0 k in q n (0) — 0(0 + 1) • • • (0 + n — 1). Clearly 
c Hj \ — (n — 1)! and c n , n — 1, for n > 1. Writing q n +\(0) — q n (0)(0 + n ), one sees 
that c n +i f k — nc n ,k + c n> k- i, for n > 2, 2 < k < n. Thus, c n ,k satisfies the same 
recursion relation (5.8) and the same boundary condition (5.7) as does s(n,k). We 
conclude that c Ht k — s(n,k). The proposition follows from this along with (5.6). □ 

(Q\ 

In light of Proposition 5.2, from now on, we write the probability measure P„ 
in the form 


p!, e) m) - 


qN^\g) 

000 


We now set the stage to prove Theorem 5.1. The probability generating function 
Px (s') of a random variable X taking nonnegative integral values is defined by 


oo 


P x (s) = Es x = J2 s ‘ p ( x = 0. kl < 1- 


i =0 
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The probability generating function uniquely determines the distribution; indeed, 
h d ^ U=° — P(A — 0- Let P N (n)(s;0 ) denote the probability generating 
function for the random variable N^ u) under P ^ : 


P nM (s; 0) = J2 s ‘ P n e) ( N(n) = 0- 

7 = 1 

Recalling that s(n,i) denotes the number of permutations in S n with i cycles, it 
follows that 


p(°\N {n) = i) = 


9 l s(n, i ) 
0 (") 


Using this with (5.6) and Proposition 5.2 gives 


LVw (u 0 ) — 

7 — 1 


jO's(n,i ) 

V 

0(») 


(s9)^ s9(s9 + 1) • • • (s# + n — 1) 

6>W ~~ 6>(6> + 1) • (6> +n - 1) 


n( 


o 

0 + 1-1 


^ + 


i - 1 
0 + i - 1 


)• 


(5.9) 


A random variable A is distributed according to the Bernoulli distribution with 
parameter p e [0, 1] if P(A = 1) = p and P(A = 0) = 1 — p. We write 
A ~ Ber(/?). The probability generating function for such a random variable is 
ps + 1 — p. Now let {X e{e+l _ x) -x} n i=l be independent random variables, where 

x g(g+i _ ir , ~ BerCg^Y). Let Z„, 0 = E"=i ^(e+i-i)-'- Then the probability 
generating function for Z„ ^ is given by 


P z „ e 0) = Es z "- e = Es E ‘ =lX ‘ 


n 


6(0+i- 1) 


-n-1 


=n 


z 




7—1 


77 


re 

7 — 1 


9 7-1 

— »v + — ) 

0 + / — l 0 + / - r 


(5.10) 


For the third equality above we have used the fact that the expected value of a 
product of independent random variables is equal to the product of their expected 
values. From (5.9), (5.10), and the uniqueness of the probability generating function, 
we obtain the following proposition. 

Proposition 5.3. Under Pn°\ the distribution of N^ u) is equal to the distribution of 
Y^=i XQ(Q+i~\)-i, where {A^^ +/ _ 1 )-i}” =1 are independent random variables, and 

^<9(19+7 — l) -1 ~ Ber ( 0 +/-!)- 
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Remark. As an alternative way of arriving at the result in the proposition, there is 
a nice probabilistic construction of uniformly random permutations (0 = 1) that 
immediately yields the result, and the construction can be amended to cover the 
case of general 0. See Exercise 5.4. 

We now use Proposition 5.3 and Chebyshev’s inequality to prove the first 
theorem. 

Proof of Theorem 5.1. Let Z n $ — YH=i Xd(0+i-i)~ l - By Proposition 5.3, it 
suffices to show that 

lim P(| 11,6 —6 1 > e) = 0, for all e > 0. (5.1 1) 

n->oo log fl 

If X p ~ Ber(/>), then the expected value of X p is EX p — p , and the variance 
is Var(X^) = p{\ — p). Since the expectation is linear, we have EZ n $ — 

i o+i-i • By considering the above sum simultaneously as an upper Riemann 
sum and as a lower Riemann sum of appropriate integrals, we have 


0 + /-1 


Yl 

o {login + 6) - log#) = 9 f — | — dx < + 

Jo V + X “ 

r n ~ i i 

1 + 0 / dx — 1 + 0 ( 1 og(ft — 1 + 0) — log 0) . 

Jo 0 + x 


— EZ n7 o < 


Since log(ft + 0) = \ogn + log(l + |) and lo g(n -1 + 0) = \ogn + log(l + ^-), 
the above inequality immediately yields 


EZj^q — Ologn + 0(1), as ft -> oo. 


(5.12) 


Since the variance of a sum of independent random variables is the sum of the 
variances of the random variables, we have Var(Z„^) = Y^l= i {o+~-\) 2 • Similar to 
the integral estimate above for the expectation, we have 


n Y /» 72 — 1 1 

Var(Z„ 0 ) < 0 Y <0 + 0/ — dx — 0 + 0 log(ft — 1). (5.13) 

^ 2—1 J 1 x 

2—2 1/1 

Using (5.12) for the last inequality below, we have for sufficiently large ft 


' 77 + 


log ft 


-0| > e) = — 0 log « | > e log n) = 

- EZ n o) + (EZ „'0 0 log n) | > elog n) < 

- EZ n fi | > elog n - \EZ nfi -01ogn)|) < 

> If log n). 


P(\Z n ,o ~ EZ n fi 


(5.14) 


5 Cycles in Random Permutations 


57 


Applying Chebyshev’s inequality to the last term in (5.14), it follows from (5.13) 
and (5.14) that for sufficiently large n , 



0|><O< 


6 + 6 lo g(n — 1) 

\e 2 \og 2 n 


(5.15) 


Now (5.11) follows from (5.15). □ 

We now develop a framework that will lead to the proof of Theorem 5.2. Given a 
positive integer n and given a collection {cii} n i=l of nonnegative integers satisfying 
Y^i = i i a i = let c n (a \, . . . , a n ) denote the number of permutations a e S n with 

cycle type (a \ , . . . , a n ). From the definition of P,\ , we have 


P^ 9) (C[ n) 





6 +) 


To prove Theorem 5.2, we need to analyze Pn \c^ — mi, ... , Cj U) — mj), for 
large n and fixed j . We have 


= m i , C 2 (n) = m 2 cj n) = mj) = 

pW (C W =Wi> _ f cj n) =w j , . . . , C„ ( ' ,) =a„) = 


E 


E/=i im + Y!l=j + \ iai = 

fly + l>0,...,fl /l >0 


77 


E 


^Ei=i»ii+Ei=y+ia/ Cn ( mi) _ _ .,mj,a j+ i,. . ,,a„) 


0M 


(5.16) 


E/= i * w/ + T!i=j + 1 i ai =n 

fly+i >0 


We calculate c n (a \, . . . , a n ) by direct combinatorial reasoning. 

Lemma 5.1. 


G? (^1 j • • • j ^n) — 


n\ 


n"=i iaia i'- 


Remark. From the lemma and (5.16), we obtain 


Pj?\c[ n) = m u + ] = m 2 ,..., c] n) = mj) = 


^_n( fr 

6+) 1 i m j 


E 


n ( 9\ai 

n (,) 


7=1 


n 


E/= i J'"i/+Ej = /+i iai=n 

aj + i>0,...,a„>0 


= 77 / -7 + 1 


! 
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The sum on the right hand side above is a real mess; however, a sophisticated 
application of generating functions in conjunction with the lemma will allow us 
to evaluate the right hand side of (5.16) indirectly. 

Proof of Lemma 5.1. First we separate out a\ numbers for 1 cycles, 2a 2 numbers 
for 2 cycles,. — l)a 77 _i numbers for (n — 1) cycles, and finally the last na n 
numbers for n cycles. The number of ways of doing this is 



n\ 

a\ !(2a 2 ) ! • • • ( na n )\ 


n — a 1 (n — l)a 77 _i 


na 


n 


The a\ numbers selected for 1 cycles need no further differentiation. The 2a 2 
numbers selected for 2 cycles must be separated out into a 2 pairs. Of course the 
order of the pairs is irrelevant, so the number of ways of doing this is 

l (2a 2 \(2a 2 -2\ A\/2\_ (2 a 2 )l 

a 2 \\ 2 2 )' \2)\2) ~ a 2 \(2\y2 

The 3a 3 numbers selected for 3 cycles must be separated out into a 3 triplets, and then 
each such triplet must be ordered in a cycle. The number of ways of separating the 
3a 3 numbers into triplets is 

1 /3a 2 \/3a 3 -3\ = (3a 3 )! 

03^3^ 3 ) a 3 !(3!)« 3 ' 

Each such triplet can be ordered into a cycle in (3 — 1) ! ways. Thus, we conclude that 
the 3a 3 numbers can be arranged into a 3 3 cycles in ((3 ~ 1 ^ ) ! ! ( ) 3 , 3 ) l^ /3)! ways. Continuing 
like this, we obtain 



n\ (2 a 2 )\ ((3 — l)!) fl3 (3a 3 )! 

a\\(2a 2 )\ • • • ( na n ) \ a 2 !(2!) fl2 a 3 !(3!) a3 


n\ 

a\ \a 2 \ ”’a n !2 a2 3 a3 • • • n a » 


((n-l)!) a "(na n )! 

a n \(n\) an 


□ 

We now turn to generating functions. Consider an infinite dimensional vector 
x = (xi, x 2 , . . .)> an d f° r any positive integer n, define x (/7) = (xi, . . . , x n ). For 
a — (a 1 , . . . , a 77 ), let x a = (x^) a := x\ l • • • x^ n . Let T(cr) denote the cycle type 
of a G S n . Define the cycle index of S n ,n > 1 , by 
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<p n (x) = (p n {x (n) ) = “T V'x 7 ’ (a) = 

n ! “ 

cr€G 


1 

n\ 


E /j 

,- =1 


We also define </>o(x) — 1- We now consider (formally for the moment) the 
generating function for (j> n {6x ): 


oo 

= y j (p n (9x)t n , x — (x\, X 2 , . . 

n = 0 


Using Lemma 5.1, we can obtain a very nice representation for G (9 \ as well as a 
domain on which its defining series converges. Let | \x\ |oo := sup 77>1 \x n \. 

Proposition 5.4. 


oo 


Xit 


G^°\x, t ) = exp(0 for \t \ < 1, | \x \ |oo < oo. 


/ = l 


Proof Consider t G [0, 1) and x with Xj > 0 for all j , and HxHoo < oo. Using 
Lemma 5.1 and the definition of (p n (x), we have 


oo 


G {6 \x, t ) = Y J E 


77=0 E/ = i jaj = 


c n (a)(6x) a t n 

n\ 


n 


oo 


E E 

" =0 E"=i./«i=« 

a\>0,...,a n >0 


n\ (0x\) ai • • • (6x n ) an t 


n 


n"=i i a, ai ! 


ft! 


oo n (6xit l \ai 

e e n ■ 


oo ,0Xit 


77=0 E" = iW=h i = l 

ai>0,...,a n >0 


<2j ! 


- e n 

ai> 0 ,a 2 > 0 ,... i = 1 


jay 


oo 


a,- ! 


=n 


. #1 


? 

£ ' = 


Z = 1 


OO 


E X/ t 

— )• 

,=1 ; 


(5.17) 


The right hand side above converges for t and x in the range specified at the 
beginning of the proof. Since all of the summands in sight are nonnegative, it follows 
that the series defining G^ 9) is convergent in this range. For t and x in the range 
specified in the statement of the theorem, the above calculation shows that there is 
absolute convergence and hence convergence. □ 

We now exploit the formula for G {9 \x,t) in Proposition 5.4 in a clever way. 
Recall that 
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iog(i -t) = -^2 


(5.18) 


For x = (xi , X 2 , . . .) and a positive integer j , let x^’ 1 — (xi , . . . , Xj , 1 , 1 , . . .). In 
other words, x^’ 1 is the infinite dimensional vector which coincides with x in its 
first j places and has 1 in all of its other places. From Proposition 5.4 and (5.18) we 
have 


oo fi y / -| \ j j -| j 

G i0) (x {]) ’\ t) = exp (9 j) exp(d Y *' )= exp(fl Y 

i = l 1 i = \ ^ ' i = 1 


(*i— i)C 

/ • 
l 

(5.19) 


We will need the following lemma. 

Lemma 5.2. Let 0 e (0, oo). Let bj be a convergent series, and assume that 


oo 


oo 


(1 - 1 ) 


9 


Y bit ' = Y, Yit' ’ \t\ < 1 


/=() 


1=0 


If 0 > 1, also assume that \bi \ < oo. If 6 e (0, 1), also assume that 

< oo, for some s > 1. 


lim 

n^oo 


n\ 

0M Yn = 


Y b >- 


Proof Since |r=o = 0(0 + 1) • • • (0 + n — 1) = 0^ n \ the Taylor expansion 

for is given by 


1 

(1-0* 



(5.20) 


where for convenience we have defined 0 (0) = 1. Thus, the Taylor expansion for 
(TV £“o bit 1 is given by 


1 

( 1 - 0 * 


oo 


Y b if 


OO 

n —0 


where d n 


E n 7 q 0 1 0 

i =0 ? ( n—i)\ 


Therefore, by the assumption in the lemma, we have 




/ =o 


(n — i)\ 
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If 0 — 1, then k\ — 6^ k \ for all k. Consequently the above equation reduces to 
y n = J21=o hi s and thus the statement of the lemma holds. When 0^1, then using 
the additional assumptions on {bj}ff 0 , we can show that 


lim 

n — >oo 


n\ 

Q(n) 


n 


E* 


Q(n-i) 


(n — i)\ 


oo 


E^ 


(5.21) 


which finishes the proof of the lemma. The reader is guided through a proof of (5.21) 
in Exercise 5.5. □ 

We can now give the proof of Theorem 5.2. 

Proof of Theorem 5.2. From (5.19) and the original definition of G (0) (x, t), we have 


j 


oo 


(i-0 


g e *P(0 E ~ j 1>? ) = E^( 0X ° );1 ) ? 

/ = 1 it = o 


/? 


(5.22) 


Considering x and 0 as constants, we apply Lemma 5.2 to (5.22). In terms of the 
lemma, we have 


j 


Yn — <Pn(0x (j);l ) and 


exp(0 E 


(Xi - 1 )t‘ 


OO 


) = E^ fi 


(5.23) 


/ = i 


i=0 


In order to be able to apply the lemma for all 6 > 0, we need to show that 
YleLo s 1 1 hi | < oo, for some s > 1. Define {bi }JE 0 by 


exp(0 E 

i = 1 


lx,- — It 


oo 


■) = E*» 


i=0 


(5.24) 


Since all of the coefficients in the sum in the exponent on the left hand side of (5.24) 
are nonnegative, we have bj > \bj \ >0, for all i. The reader is asked to prove this 
in Exercise 5.6. The function on the left hand side of (5.24) is real analytic for all 
t e M (and complex analytic for all complex t); consequently, its power series on 
the right hand side converges for all t el. From this and the nonnegativity of bi , it 
follows that Yl^=o sI bi < oo, for all s > 0, and then, since \bj \ < bf , we conclude 

that JfoLo s 1 1^ I 00 > f° r s — 0. 

By definition, from (5.23), we have 
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Consider now 


n\ 


n\ 


Yn = = 


Q(n) Q(n) 


Q(n) 


Y c n (a)(6x^ ,x ) a . (5.26) 


E n 

i = i iai=n 
ai>0,...,a n >0 


For any given j -vector (mi, . . . , rrij) with nonnegative integral entries, the coeffi- 


cient of x™ l x™ 2 • • • x™ ] in (5.26) is 


j 


1 


0 (") 


E 


QT,i= 1 mi+T,i=j + ia i Cn ( mu _ _ . ,mj,Ctj + \, . . ,,a„). 


E/= 1 i rn + T!i=j + 1 i di = 

dj+i>0,...,a n >0 


But by (5.16), this is exactly (C[ n) — mi, C 2 (;7) = m 2 , . . . , Cj U) — mj). By 

Lemma 5.2, lim^oo y n exists, and this is true for every choice of x and 0 ; thus, 
we conclude that 


lim -^ry„ = lim ^4> n {6x {m ) = ^ 


n—>oo 


0(n) 


«— o 


Q{n) 




m 1 


m\ j >0 


where 


m i 

x j > 


(5.27) 


Pm u ...,mj (0) = lim P^ 9) {c[ n) = mi, C 2 (n) = m 2 , ■ ■ ■ , C- n) = m y). (5.28) 


n—>oo 


Applying Lemma 5.2, we conclude from (5.25) and (5.27) that 


j 


jn i 


exp(0 E^> = E Pm\,...,mj(9) x i ' ' ' X j 

i = 1 /77i>0,...,my >0 




(5.29) 


On the one hand, (5.29) shows that the coefficient of x 7 ” 1 • • • xj 7 in the Taylor 

expansion about x = 0 of the function exp(0 J2 J j=l ^-p-) is p mi m . (9). On the 
other hand, by Taylor’s formula, this coefficient is equal to 


1 d m x+-+ m j ( exp (6> J2i=\ ^V 1 )) 


m\ \ • • • mj ! 


m ,• 


9 ,/ • • • Ox: 


L=o — 


1 J -i J n J (0_\nn 

-E^ex P (-^i)n(-r=n- (7) 

. . . VY1 : ' • J 7 A A 7 A A 

i = 1 


mi ! • • • mj ! 


. , i A i i 

i = i / = l 


m, ! 


(5.30) 
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Thus, from (5.28)-(5.30), we conclude that 


n 



i = 1 


completing the proof of Theorem 5.2. 


□ 


( 9 ) 

Exercise 5.1. Verify that Proposition 5.1 follows from the definition of P„ along 
with Proposition 5.2 and Lemma 5.1. 

Exercise 5.2. Show that (5.4) is equivalent to 


n 


lim Pj, 9 \c\ n) 


oo 



mi >0, i = 1 


(5.31) 


Exercise 5.3. In this exercise you will show directly that (5.5) follows from (5.4). 

(a) Fix an integer j > 2. Use (5.31) to show that for any e > 0, there exists an N € 
such that if n > N e and m > N € , then 


P^°\C- n) > m, for some i e [j]) < €. 


(5.32) 


(b) From (5.31) and (5.32), deduce that (5.31) also holds if some of the m/ are equal 
to oo. 

(c) Prove that (5.5) follows from (5.4). 

Exercise 5.4. This exercise gives an alternative probabilistic proof of Proposi- 
tion 5.3. A uniformly random (6 = 1) permutation a e S n can be constructed in 
the following manner via its cycles. We begin with the number 1 . Now we randomly 
choose a number from [n]. If we chose j , then we declare that o\ — j . This is the 
first stage of the construction. If j ^ 1 , then we randomly choose a number from 
[n\ — {j}. If we chose k , then we declare that a 7 — k. This is the second stage of 
the construction. If k ^ 1, then we randomly choose a number from [n] — {j, k}. 
We continue like this until we finally choose 1, which closes the cycle. For example, 
if after k we chose 1, then the permutation a would contain the cycle (1 jk). Once 
we close a cycle, we begin again, starting with the smallest number that has not yet 
been used. We continue like this for n stages, at which point the permutation a has 
been defined completely. 

(a) The above construction has n stages. Show that the probability of completing a 
cycle on the j th stage is /7+ | — . Thus, letting 
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1 , if a cycle was completed at stage j ; 
0, otherwise, 


it follows that ~ Ber(^-[ — r). 

J V 7? + l— 

(b) Argue that } n j = \ are independent. 

(c) Show that the number of cycles N^ u) can be represented as N^ n) — YTj=\ \ 
thereby proving Proposition 5.3 in the case 6 = 1. 

(d) Let 6 e (0, oo). Amend the above construction as follows. At any stage j, 
close the cycle with probability . , and choose any other particular number 

that has not yet been used with probability u j r \_j • Show that this construction 

yields a permutation distributed according to P„ , and use the above reasoning 
to prove Proposition 5.3 for all 6 > 0. 


Exercise 5.5. (a) Show that if JA =0 \bj\ < oo and the triangular array {c n j : i = 
0,1 n = 0, 1, . . .} is bounded and satisfies lim^oo = 1, for all 

/, then lim^oo YTi = t b i c n,n-i — bi • Then use this to prove (5.21) in the 
case that 6 > 1 . 

(b) Show that if 6 e (0, 1), then ^ , if i < n. Also, < n, 

where we recall, 0 (O) = 1. 

(c) Show that if I hi W K 00 » where s > 1, then | b\ | < s~ l , for all large i . 

(d) For 0 e (0, 1), prove (5.21) as follows. Break the sum Y^=o hi fn-i)\ ^ nt0 
three parts — from i = 0 to / = N, from i = N + 1 to / = [|], and from 
i = [|] + 1 to / = n. Use the reasoning in the proof of (a) to show that by 
choosing N sufficiently large, the limit as n -> oo of the first part can be made 
arbitrarily close to Yl^=o ^ • Use the fact that 1^/ 1 < oo to show that by 
choosing N sufficiently large, the limsup^^ of the second part can be made 
arbitrarily small. Use (b) and (c) to show that the limit as n -> oo of the third 
part is 0. 


Exercise 5.6. Prove that bj > \bj\, where and {bj }^ 0 are defined in (5.23) 

and (5.24). 


Exercise 5.7. Make a small change in the proof of Theorem 5.2 to show that (5.5) 
holds. 

Exercise 5.8. Consider the uniform probability measure on S n and let 
denote the expectation under Pn \ Let X n = X n (a) be the random variable denoting 
the number of nearest neighbor pairs in the permutation a e S n , and let Y n = Y n (a) 
be the random variable denoting the number of nearest neighbor triples in a e S n . 
(A nearest neighbor pair for a is a pair k,k + 1, with k e [n — 1], such that o\ = k 
and arj + 1 = k + 1, for some i e [n — 1], and a nearest neighbor triple is a triple 
(k, k + 1 , k + 2) with k e [n — 2] such that cr*- = k, ay+i = k + 1 and a, +2 = k + 2, 
for some i e [n — 2].) 
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(a) Show that E^ l) X n — 1, for all n. (Hint: Represent X n as the sum of indicator 
random variables {h}l=\ > where /^(cr) is equal to 1 if k,k + 1 is a nearest 
neighbor pair in a and is equal to 0 otherwise.) It can be shown that the 
distribution of X n converges weakly to the Pois(l) distribution as n -> oo; 
see [17]. 

(b) Show that lim^oo E^Yn — 0 and conclude that lim^oo pj, l) {Y n — 0) = 1. 


Chapter Notes 

In this chapter we investigated the limiting distribution as n -> oo of the 
random vector denoting the number of cycles of lengths 1 through j in a random 
permutation from S n . It is very interesting and more challenging to investigate 
the limiting distribution of the random vector denoting the j longest cycles or, 
alternatively, the j shortest cycles. For everything you want to know about cycles in 
random permutations, and lots of references, see the book by Arratia et al. [6]. Our 
approach in this chapter was almost completely combinatorial, through the use of 
generating functions. Such methods are used occasionally in [6], but the emphasis is 
on more sophisticated probabilistic analysis. Our method is similar to the generating 
function approach of Wilf in [34], which deals only with the case 9 = 1. For 
an expository account of the intertwining of combinatorial objects with stochastic 
processes, see the lecture notes of Pitman [30]. 


Chapter 6 

Chebyshev’s Theorem on the Asymptotic 
Density of the Primes 


Let 7 x{n) denote the number of primes that are no larger than n \ that is, 

?r(«) = X! r 

p<n 


where here and elsewhere in this chapter and the next two, the letter p in a 
summation denotes a prime. Euclid proved that there are infinitely many primes: 
lim^oo 7 r(n) — oo. The asymptotic density of the primes is 0; that is, 


lim 

77— >0 O 


7 x{n) 
n 



The prime number theorem gives the leading order asymptotic behavior of 7t(n). It 
states that 


Tt(n) log n 
lim 

7?->oo n 



This landmark result was proved in 1896 independently by J. Hadamard and by C .J. 
de la Vallee Poussin. Their proofs used contour integration and Cauchy’s theorem 
from analytic function theory. A so-called “elementary” proof, that is, a proof that 
does not use analytic function theory, was given by P. Erdos and A. Selberg in 1949. 
Although their proof uses only elementary methods, it is certainly more involved 
than the proofs of Hadamard and de la Vallee Poussin. We will not prove the prime 
number theorem in this book. In this chapter we prove a precursor of the prime 
number theorem, due to Chebyshev in 1850. Chebyshev was the first to prove that 
t x(n) grows on the order Chebyshev’s methods were ingenious but entirely 
elementary. Given the truly elementary nature of his approach, it is quite impressive 
how close his result is to the prime number theorem. Here is Chebyshev’s result. 


R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 

DOI 10. 1007/978-3-3 19-07965-3_6, © Springer International Publishing Switzerland 2014 
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6 Chebyshev’s Theorem 


Theorem 6.1 (Chebyshev). 

7t(n)\ogn Tt(n)\ogn 

0.693 % log 2 < limmf < limsup < log 4 % 1.386. 

n->oo n «->oo « 

Chebyshev’s result is not the type of result we are emphasizing in this book, since 
it is not an exact asymptotic result but rather only an estimate. We have included the 
result because we will need it to prove Mertens’ theorems in Chap. 7, and one of 
Mertens’ theorems will be used to prove the Hardy-Ramanujan theorem in Chap. 8. 
Define Chebyshev’s 0 -function by 

9{n) = y^\ogp. (6.1) 

p<n 


Chebyshev realized that an understanding of the asymptotic behavior of 9(h) allows 
one to infer the asymptotic behavior of n(n) (and vice versa), and that the direct 
asymptotic analysis of the function 0 is much more tractable than that of the function 
7T, because the sum of logarithms is the logarithm of the product. Indeed, note that 

9(n) = log Y[ P • (6-2) 

p<n 


We will give an exceedingly simple proof of the following result, which links the 
asymptotic behavior of 0 to that of 7t. 


Proposition 6.1. 

(i) liminf^oo — liminf 


7i(n)\ogn 


n—>oo 


n 


(ii) limsup,,^ ^ = limsup 
Proof We have the trivial inequality 


n—>oo 


tt ( n ) log n 


n 


0(n) = E log p < 7i (n) log n . 

p<n 


Dividing this by n and letting n -> oo, we obtain 


r . J(n) n(n)logn 9{n) n(n)\ogn 

liminr < liminr ; limsup < limsup . (6.3) 

77->oo n n->oo n n-+o o « «->oo « 


We have for € e (0, 1), 


6(h) > \og p > (n) — 7i ([n l e ])^logw 1 e > 

[n l ~ € ]<p<n 

(1 — e)(^7t(n) log n — [n l ~ e ] logw^, 
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where the last inequality comes from the trivial fact that n(y) < y. Dividing this by 
n and letting n -+ oo, and using the fact that € e (0, 1) is arbitrary, we obtain 

0(n) 7i(n)\ogn 0(n) n (n) log n 

liminf > liminf ; limsup > limsup . (6.4) 

77->00 n 77->00 n 77->00 W 77->00 W 

The proposition follows from (6.3) and (6.4). □ 

The following theorem gives an upper bound on the Chebyshev 0 -function. 

Theorem 6.2. < log 4, n > 1. 

Proof. The proof is by induction, the inductive hypothesis being that 0 (n ) < n log 4. 
Note that the hypothesis holds for n — 1, 2. If n + 1 > 3 is even, then 6(n + 1) = 
0(h) < h log 4 < (h + 1) log 4, where the first inequality comes from the inductive 
hypothesis. If n + 1 is odd, then write n + 1 = 2m + 1, and note that the 
binomial coefficient + 1 ) = ( 27 ”+ 1 )(^fi'”6»+2) j s ^iyisibie by every prime between 
m + 2 and 2m + 1 (since all such primes appear in the numerator of the latter 
expression, but not in the denominator). Since ( 2w ^ l ~ 1 ) is a positive integer (all 
binomial coefficients are integers) which contains as factors all the primes between 
m + 2 and 2m + 1 , we have 


n p ^ 

777 + 2</7 < 2 / 77+1 


2m + 1 
m 


(6.5) 


By the binomial formula, 


2/77 + 1 / 

2 2m+ i=(l + l) 2m+1 = ^ I 

7=0 V 


2m + l\^/2m + l\ /2m + l\ (im + l 

j I \ m I \m + \ ) l m 


thus, 


2m + i , < 22m 
m 


( 6 . 6 ) 


From (6.2), (6.5), and (6.6) we have 


0(2m + 1) — 0(m + 1) = log [ p < log2 2m = mlog4. (6.7) 

777 + 2 </? < 2/77 + 1 


From (6.7) and the inductive hypothesis, we have 
0(2m + 1) < 0(m + 1) + m log 4 < (m + 1) log 4 + m log 4 = (2m + 1) log 4; 


that is, 0(n + 1) < (n + 1) log 4. 


□ 
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As we noted above, the direct asymptotic analysis of 6 is much more tractable 
than that of 7t, and Theorem 6.2 carried out an upper bound analysis for 6. It turns 
out that for the lower-bound analysis it is better to work with Chebyshev’s f- 
function instead of Chebyshev’s 0 -function. One defines 

tin) = yy log p. (6.8) 

p k <n,k> 1 


That is, in the sum above, a term log p appears for every prime p and integer k > 1 
for which p k < n. So, for example, ^(14) = 3 log 2 + 2 log 3 + log 5 + log 7 + 
log 11+ log 13. Of course, f(n) — 0(n). We show now that 0 and f have the same 
asymptotic behavior. 

Proposition 6.2. 

(i) liminf„^oo + = liminf„^oo +; 

(ii) limsup,,^ + = limsup,,^ 

Proof. Since f(n) > 9(n ), we have 

, f 0(n) . f(n) 0(w)„ r tin) 

lim ml < lim ml ; limsup < limsup . (6.9) 

>oo n n->oo n n->oo « / 7 ->oo « 

Since 2 k < n if and only if k log 2 < log n, or equivalently, k < [j^], it follows 
that p k > n for every prime p and every k > [j^]; thus 


[ log n 1 

L log 2 J 


f(n)~6(n)= yy log p = yy yy p = 

p k <n,k> 2 


k= 2 i 

p <77 k 


r log » 1 
L log 2 J 


E ««»*]> s I 222 ««»*)) 


k = 2 


log 2 


( 6 . 10 ) 


Now trivially, #(/:) = Jf p<k log p < k \ogk. Using this with (6.10) gives 


2„ k 


f{n) — 6(n) < 


(logn+n 
2 log 2 


( 6 . 11 ) 


From (6.1 1) it follows that 


6+/U t/U/i) 

i* • r V / ^ i* • r T \ / 

lim ml > lim ml 

77— >00 71 77->00 77 


.. #(«) . .. tin) 

lim sup > lim sup 


77— >QO 


n 


77 — >00 


n 


( 6 . 12 ) 


The proposition follows from (6.9) and (6.12). 


□ 
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Remark. The bound obtained in (6.11) can be improved by replacing the trivial 
bound on 6, namely, 0(k) < k log k, by the bound obtained from Theorem 6.2. 

We will carry out a lower-bound analysis of f. This will be somewhat more 
involved than the upper bound analysis for 0 but still entirely elementary. For n e N 
and p a prime, let v p (n) denote the largest exponent k such that p k \n. One calls 
v p (n) the p-adic value of n. It follows from the definition of v p that any positive 
integer n can be written as 


n = P[ p v ” (n) . (6.13) 

P 

In Exercise 6.1 the reader is asked to prove the following simple formula: 

v p (mn ) = v p (m) + v p (n), m,n e N. (6.14) 

From (6.14) it follows that 

77 

v P (nl) = E v p (m). (6.15) 

777 — 1 

We will need the following result. 

Proposition 6.3. 

oo 

v P (nl) = J2[El 

k = 1 7 

Proof We can write 


v 


p 


(m)= 1 

1 <k<OO.p k 1 777 


Using this with (6.15), we have 


77 


77 


OO 


V 


P 


(«!) = !] v p( m ) = E E l = E E 1 

777 — 1 \<k<00,p k ^ 


(6.16) 


777—1 


1 777 k — 1 l<m<n,p k \m 


if p k > n , then obviously there is no m e [n\ for which p k \m. If p k < n , 
then the integers m e [n\ for which p k \m are the [-^] integers p k , . . . , [ J j\p k . 

Thus, Xli <777 <77 p k 1 777 1 = [^k\- Substituting this in (6.16) completes the proof of the 
proposition. □ 

We can now carry out a lower-bound analysis of f . 
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Theorem 6.3. 


lim inf 222 > l og 2. 

«— >0 O 77 


Proof. Consider the binomial coefficient ( 


P0 - Using (6.13) we have 



(2 n)\ 
(, n \ ) 2 


n 


pV p {{2n)')-2v p {n !) 


= n 


pV p ((2n)\)—2v p (n\) 


P 


p<2n 


(6.17) 


where the final equality comes from the fact that neither (2 n)\ nor n\ has a prime 
factor larger than 2n. From Proposition 6.3, we have 


/ yi \ 

v p ((2n)\) - 2 v p (n\) = {[~t] ~ 2 [t^])- ( 6 - 18 ) 

k = 1 1 1 

Of course, [^f] = [^] = 0 if j o k > 2 n, that is, if k > [^^]. Thus, 

in the summation over k above, we may replace the upper limit oo b y [iBl 
Furthermore, it is easy to verify that [2x\ — 2[x] is equal to either 0 or 1, for all 
real numbers x. From these two facts we obtain from (6.18) the estimate 


0 < v p ((2n)l) — 2 v p (n!) < 
From (6.17) and (6.19) we have the estimate 


log 2 n 
log p 



On the other hand we have the easy estimate 



(6.19) 


( 6 . 20 ) 


( 6 . 21 ) 


To prove (6.21), note that the middle binomial coefficient ( 2 ”) maximizes ( 2 ") over 
k e [2n\. The reader is asked to prove this in Exercise 6.2. Thus, we have 



(1 + l) 2/? 



2 / 7-1 

2+E 


k = 1 



< 2 + (2n — 1) 
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From (6.20) and (6.21), we conclude that 


2 ln 

In 


< 


n r log 2« 
P L log P 


] 


p<2n 


or, equivalently, 


In log 2 — log In < 


E [ 

p<2n 


log 2/t 
log p 


] log P- 


( 6 . 22 ) 


Recalling from (6.8) that \[/(2 n) — J2 p k <2n k>\ l°g P> it follows that the summand 

log p appears in \/f(2n) one time for each k > 1 that satisfies p k < 2n; that is, the 
summand log p appears [yy ] times. Thus, the right hand side of (6.22) is equal to 
\j/(2n), giving the inequality 


if/ (2 n) > 2 n log 2 — log 2 n . 


(6.23) 


Of course then we also have 


x/r (2 n + 1) > 2 n log 2 — log 2 n . (6.24) 

Dividing (6.23) by 2 n and dividing (6.24) by 2 n + 1, and letting n -> oo, we 
conclude that 


r • f VKw) . , , 

limml > log 2, 

n— >oo n 

which completes the proof of the theorem. □ 

We can now prove Chebyshev’s theorem in one line. 

Proof of Theorem 6.1. The upper bound follows from Theorem 6.2 and part (ii) 
of Proposition 6.1, while the lower bound follows from Theorem 6.3, part (i) of 
Proposition 6.2, and part (i) of Proposition 6.1. □ 

Exercise 6.1. Prove (6.14): v p (mn) = v p (m) + v p (n), m,n e N. 

) = max^g 

Exercise 6.3. Bertrand’s postulate states that for each positive integer n , there 
exists a prime in the interval ( n,2n ). This result was first proven by Chebyshev. 
Use the upper and lower bounds obtained in this chapter for Chebyshev’s 0-function 
to prove the following weak form of Bertrand’s postulate: For every e > 0, there 
exists an no(e) such that for every n > no(e) there exists a prime in the interval 
(n, (2 + e)n). 



Exercise 6.2. Prove that ( 2n 

\ n 
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Chapter Notes 


Chebyshev also proved that if lim,,-^ 7r(/?) /? log/? exists, then this limit must be equal 
to 1. For a proof, see Tenenbaums’ book [33]. Late in his life, in a letter, Gauss 
recollected that in the early 1790s, when he was 15 or 16, he conjectured the prime 
number theorem; however, he never published the conjecture. The theorem was 
conjectured by Dirichlet in 1838. For some references for further reading, see the 
notes at the end of Chap. 8. 


Chapter 7 

Mertens’ Theorems on the Asymptotic 
Behavior of the Primes 


Given a sequence of positive numbers {a n }^L l satisfying lim^oo^ = oo, one 
way to measure the rate at which the sequence approaches oo is to consider the rate 
at which the series YTj= 1 g rows - For a j — 7 , it is well known that the harmonic 

series E”=, - satisfies Y^i=\ ~ — log n + 0(1) as n -> oo. How does the harmonic 

*7 J ^ J 

series of the primes behave? The goal of this chapter is to prove a theorem known 
as Mertens ’ second theorem. 

Theorem 7.1. 


/ — = log log n + 0(1), as n ^ oo. 

z — ' n 

p<n 

Mertens’ second theorem will play a key role in the proof of the Hardy- 
Ramanujan theorem in Chap. 8. For our proof of Mertens’ second theorem, we will 
need a result known as Mertens’ first theorem. 

Theorem 7.2. 


E 

p<n 


log l 

P 


— lo gn + 0(1), as n—^oo. 


We now prove Mertens’ two theorems. 

Proof of Mertens’ first theorem. We will analyze the asymptotic behavior of log ft! 
in two different ways. Comparing the two results will prove the theorem. First we 
show that 


log ft! = ft log ft + 0(ft), as ft -> oo. 


( 7 . 1 ) 
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We note that (7.1) follows from Stirling’s formula: n ! ~ n n e~ n \Jlnn. However, we 
certainly don’t need such a precise estimate of ft! to obtain (7.1). We give a quick 
direct proof of (7.1). Consider an integer m > 2 and x e [m — l, m\. Integrating the 
inequality \og(m — 1) < logx < logm over x e [m — l, m\ gives 

r> m 

log(m — 1) < / log x dx < logm, 

J m — 1 


which we rewrite as 

m 

0< login— log x dx < logm — log(m — 1). 

J m — 1 

Summing this inequality from m — 2to m — n, and noting that the resulting series 
on the right hand side is telescopic, we obtain 

/ n 

1 ogxdx < log ft. (7.2) 

/i ^ 

j logx dx — ft log ft — ft + 1. Substituting this 

in (7.2) gives 


ft log ft — ft + 1 < log ft ! < ft log ft — ft + 1 + log ft , 
which completes the proof of (7.1). 

To analyze log ft! in another way, we utilize the function v p (n) introduced in 
Chap. 6. Recall that v p (n), the p - adic value of ft, is equal to the largest exponent 
k such that p k \n and that by the definition of v p , we have n — \\ p p Vp ^ = 

Y[ P <m p Vp(jl \ for any integer m that is greater than or equal to the largest prime 
divisor of ft. Recall that Proposition 6.3 states that 

oo 

vp(n') = Y}^rl 

k= 1 1 


Thus, we have 


n\ = P[ p v r (n ' ) 

p<n 



5 


p<n 


oo oo 

log”! = = ^[~] l °SP + J2('}2[^i:])logp. (7.3) 


p<n k = 1 


P 


p<n 


P 


p<n k = 2 


F 


and 
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We now analyze the two terms on the right hand of (7.3), beginning with the 
second term. We have 


oo oo . J_ 

T[f] < „ V -f = 


Thus, we obtain 


oo 

I2(Yl^) lo zp - n H 


log p 


p<n k= 2 


P 


p<n 


pip - 1) 


< Cn, 


(7.4) 


for some constant C > 0, the latter inequality following from the fact that 

J2p<n < Em= 2 m(m- 1) < 00 • We write the firSt term 0n the ri ght hand 

side of (7.3) as 


D- ] los p = » £ — - D- - [-» i»s /-• 


(7.5) 


p<n 


p<n 


p<n 


Recalling that Theorem 6.2 gives 0(n) < (log 4 )n, we can estimate the second term 
on the right hand side of (7.5) by 


0 < T(- - [-])log p < Y]log/? = 0(n ) < (log 4 )n 

L — ' V p L — 4 

p<n 1 1 p<n 


(7.6) 


From (7.3)-(7.6), we conclude that 


log ft! = ft ^ P + 0(n), as ft —> oo. 


p<n 


P 


(7.7) 


Comparing (7.1) with (7.7) allows us to conclude that J2 P<n ~~ — log ft + 0(1), 
completing the proof of Mertens’ first theorem. □ 

In order to use Mertens’ first theorem to prove his second theorem, we need to 
introduce Abel summation, a tool that is used extensively in number theory. Abel 
summation is a discrete version of integration by parts. It appears in a variety of 
guises, the following of which is the most suitable in the present context. 

Proposition 7.1 (Abel Summation). Let jo,n e Z with jo < ft. Let a : [/o,ft] fl 

Z — ^ IR, and let A : [jo, ft] -> R be defined by A(t) = Y^t=j 0 a (k)- Let f : 
[y‘o, ft] —> R be continuously differentiable. Then 

a(r)f(r) = A (ft) /(ft) - A(j 0 )f(jo) ~ 

jo<r<n 



C n 

7/0 


A(t) fit) dt. 


(7.8) 
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Remark. Since A (jo) = a (jo), we could also write the above formula in the more 
compact form 


a(r)f(r) = A(n)f(n) - 

jo<r<n 

The form in the proposition of course mimics the standard integration by parts 
formula. 

Proof. Since A is constant between integers, we have 



r>n 
J jo 


A(t ) f'(t) dt. 


(7.9) 




n — 1 

A(t)f(t)dt = J2 Mr){f(r + 1) - /(r». 


r=j 0 

(7.10) 


Substituting for A in the last term on the right hand side, and interchanging the order 
of the resulting summation, we obtain 


77 — 1 ii — l r 

X! A (r)(f(r + 1) - f{r)) = X ( X a ( k j(f( r + !) - f ( r )) = 


r=j 0 


r=j 0 k=j o 


77 — 1 77 — 1 77 — 1 

X a(k) X (fir + 1) - m) - X - /(*)) = 


k=jo r=k 


k=jo 


77 — 1 

A (n ~ !)/(«) - X a ( k )f( k )- 

k=jo 


(7.11) 


From (7.10) and (7.11) we obtain 



77 — 1 

dt = A(n - 1 )/(«) - X a( k )f(k). 

k=jo 


Substituting this in the right hand side of (7.8) gives 


A{n)f{n) - A(jo)fUo ) ~ [ A(t)f'(t) dt = 

7 jo 


77 — 1 


77 


A(n)f(n) - AUo)fUo ) - Mn - 1 )/(«) + X «(*)/(*) = X «(*)/(*)> 

k=jo k=jo+l 

(7.12) 


which proves the proposition. 


□ 
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Proof of Mertens’ second theorem. Let 



ifn = p; 
0, otherwise, 


and let 



1 

, t > 1. 

log/ 


We use Abel summation in the form (7.9) with jo = 2. By Mertens’ first theorem, 
we have 


[t] 


log p 


A(t ) = = log / + (9(1), as / — ► oo. 


k = 2 


/><[*] 


P 


(7.13) 


Thus, we obtain from (7.9) and (7.13), 


e!=e 


log p 1 


P ‘ — ' P log p 

p<n 1 p<n 1 ° 1 2<r<n 


= £ a (r)f(r) = Mn)f(n)- f A(t)f(t)dt = 


log« + (9(1 ) 


log« 


f log 

A / 


log / + 0(1) 


(log/) : 


dt. 


(7.14) 


We have 


l 


n 


t lo gt 


dt = log log t\" = log log n — log log 2, 


and since f 1 dt = — , we have 

J t{\ogt) z log?’ 


log, 


l 


oo 


1 


/(log/) 2 


dt < oo. 


Using these two facts in (7.14) gives 


Y.- = log log n + (9(1), 


as n — > oo, 


p<n 


(7.15) 


completing the proof of Mertens’ second theorem. 


□ 
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Exercise 7.1. (a) Use Mertens’ first theorem and Abel summation to prove that 

10g_p = 1 j 2 n + 0( | og n y 

p 2 

p<n 1 

(Hint: Write J2 P < n — J2\< r <n a ( r ) l°g r > where a(r) is as in the proof of 
Mertens’ second theorem.) 

(b) Use induction and the result in (a) to prove that 

Y lQg ^ = y- log^ « + C^log^ -1 ft), 
p k 

p<n 1 

for all positive integers 

Exercise 7.2. Proposition 6.1 in Chap. 6 showed that the two statements, 
T,p<n lOg P n and Tt(n) — J2 P < n 1 - , can easily be derived one from the 

other. The prime number theorem cannot be derived from Mertens’ second theorem. 
Derive Mertens’ second theorem in the form ^2 p<n -y ~ log log n from the prime 
number theorem, 7t(n) -. (Hint: Use Abel summation.) 


Chapter Notes 

The two theorems in this chapter were proven by F. Mertens in 1874. For some 
references for further reading, see the notes at the end of Chap. 8. 


Chapter 8 

The Hardy-Ramanujan Theorem 
on the Number of Distinct Prime Divisors 


Let coin) denote the number of distinct prime divisors of n\ that is, 

<o(n) = ^ !• 

p\n 

Thus, for example, co( 1) = 0, co(2) = 1, co(9) = 1, <z>(60) = 3. The values 
of coin) obviously fluctuate wildly as n — > oo, since coip) — 1, for every 
prime p. However, there are not very many prime numbers, in the sense that 
the asymptotic density of the primes is 0. In this chapter we prove the Hardy- 
Ramanujan theorem, which in colloquial language states that “almost every” integer 
n has “approximately” log log n distinct prime divisors. The meaning of “almost 
every” is that the asymptotic density of those integers n for which the number of 
distinct prime divisors is not “approximately” log log n is zero. The meaning of 
“approximately” is that the actual number of distinct prime divisors of n falls in 
the interval [log log n — (log log n) 2 +<5 , log log n + (log log n) 2 +<5 ] , where 8 > 0 is 
arbitrarily small. 

Theorem 8.1 (Hardy-Ramanujan). For every 8 > 0, 

\{n e [V] : \co(n) -log log n| < (loglog«)2 +,5 }| 
lim = 1. (8.1) 

N-+oc N 

Remark. From the proof of the theorem, it is very easy to infer that the statement of 
the theorem is equivalent to the following statement: For every 8 > 0, 

| {n G [N] : \coin) — log log N \ < (log log A0^ +<5 }| 
lim = 1. 

N-^oo N 
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While the statement of the theorem is probably more aesthetically pleasing than 
this latter statement, the latter statement is more practical. Thus, for example, take 
8 = .1. Then for sufficiently large n , a very high percentage of the positive integers 
up to the astronomical number N = e e ' 7 will have between n —rr 6 and n +rr 6 distinct 
prime factors. Let n — 10 9 . We leave it to the interested reader to estimate the 
0(1) terms appearing in the proofs of Mertens’ theorems, and to keep track of how 

they appear in the proof of the Hardy-Ramanujan theorem below, and to conclude 

10 9 

that over ninety percent of the positive integers up to TV = e e have between 
10 9 — (10 9 ) 6 and 10 9 + (10 9 ) 6 distinct prime factors. That is, over ninety percent 

of the positive integers up to e eW have between 10 9 — 251, 188 and 10 9 + 251, 188 
distinct prime factors. 

Our proof of the Hardy-Ramanujan theorem will have a probabilistic flavor. For 
any positive integer N, let TV denote the uniform probability measure on [N]\ that 
is, Pn(U}) — jf, f° r j G [N]. Then we may think of the distinct prime divisor 
function co = co(n) as a random variable on the space [N] with the probability 
measure TV- For the sequel, note that when we write TV(&> G ^1), where A c [N], 
what we mean is 


Pn(co e A) = TV({^ e [N] : coin) e A}) = 


{fl G [N] ( o(ti ) G v4} 
N 


Let Ejy denote the expected value with respect to the measure TV. The expected 
value of co is given by 


1 N 

E N m= — 2_,a>(n) 

n = 1 


The second moment of co is given by 


1 N 

or = 

n — 1 


( 8 . 2 ) 


(8.3) 


The variance VaryV&O of co is defined by 

VaryV&O = En(co — TV oj ) 2 — E n co 2 — ( E # co ) 2 . (8.4) 

We will prove the Hardy-Ramanujan theorem by applying Chebyshev’s inequal- 
ity to the random variable co: 


P N ( co — TV co > A) < 


Var^ (co) 

I2 


for A > 0. 


(8.5) 
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In order to implement this, we need to calculate E^co and Var^u;) or, equivalently, 
Ejy co and En co 2 - The next two theorems give the asymptotic behavior as N -> oo 
of En co and of En co 2 . The proofs of these two theorems will use Mertens’ second 
theorem. 

Theorem 8.2. 


E n co — log log N + 0(1), as N -> oo. 


Remark. Recall the definition of the average order of an arithmetic function, given 
in the remark following the number-theoretic proof of Theorem 2.1. Theorem 8.2 
shows that the average order of co , the function counting the number of distinct 
prime divisors, is given by the function log log n. 

Proof. From the definition of the divisor function we have 


N 


N 


N 


E^) = EE 1 = E E i = Eh = 


77 = 1 


77=1 


P 1 77 


p<N p\n,n<N p<N 


p 


n E l -E(- -[-])• 

L — 4 v L — 4 p p 

p<N 1 p<N 1 1 


( 8 . 6 ) 


The second term above satisfies the inequality 


0 < 



[-d < E 1 = < N - 

p L — < 

1 p<N 


(8.7) 


(We could use Chebyshev’s theorem (Theorem 6.1) to get the better bound ) 

on the right hand side above, but that wouldn’t improve the order of the final bound 
we obtain for E^co.) Mertens’ second theorem (Theorem 7.1) gives 

— = log log N + 0( 1), as TV -> oo. (8.8) 

P 

From (8.6)-(8.8), we obtain 



N 

Y J co{n) — N log log N + 0(N), as TV -> oo, 

77 = 1 


and dividing this by N gives 

En co — log log TV + 0(1), as TV -> oo, (8.9) 

completing the proof of the theorem. □ 
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Theorem 8.3. 

E n co 2 = (log log N ) 2 + 0(loglog N), as N -> oo. 

Remark. To prove the Hardy-Ramanujan theorem, we only need the upper bound 

E n co 2 < (log log TV) 2 + 0(loglog TV), as TV -> oo. (8.10) 


Proof. We have 

« 2 («) = (£l) 2 = (£l)(£l) = £ l+£l = E !+«(«)• (8-11) 

pi\n pi\n pipi\n p\n p\pi\n 

Pl¥ z P 2 Pl¥ z P 2 

Thus, 

N N N 

E® 2 (") = E E 1 + E co(n). (8.12) 

« = 1 n — 1 «= 1 

Pl^P2 

The second term on the right hand side of (8.12) can be estimated by Theorem 8.2, 
giving 


N 

y j co{n) — NEn co — N log log TV + 0(N ), as TV -> oo. 

n = 1 

To estimate the first term on the right hand side of (8.12), we write 


E E 1= E Ei 

n = 1 p\p2\n PiP2<N n<N 

Pi¥ z P 2 P\i z P 2 P\P2\n 


E [ 

PlP2<N 

Pl¥ z P 2 


TV 

PlP2 



" E 

P\P 2 <N 

p\i z P 2 


1 

P\Pi 


E ( 

P\P 2 <N 

p\i z P2 


TV 

PlP2 



(8.13) 


(8.14) 


The number of ordered pairs of distinct primes (p \ , pf) such that p\P 2 < TV is of 
course equal to twice the number of such unordered pairs {p \ , P 2 }. The fundamental 
theorem of arithmetic states that each integer has a unique factorization into primes; 
thus, if P 1 P 2 = P 3 P 4 , then necessarily {pi, p 2 } — Consequently the 

number of unordered pairs {p\, P 2 } such that p\P 2 E TV is certainly no greater 
than TV. Thus, the second term on the right hand side of (8.14) satisfies 


0 


< £ (— -[— ])< £ 1<2W. 

L — 4 V 7?1 Oi 7?i Oi / L — 4 


TV 


P\ P 2 —N 
p\i z P 2 


P 1 P 2 P 1 P 2 


PIP 2 <N 

P\¥ z P 2 


(8.15) 
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Using Mertens’ second theorem for the second inequality below, we bound from 
above the summation in the first term on the right hand side of (8.14) by 


E 

P\P2<N 

P\±P2 


1 

PlPl 


< — ) 2 < (loglogN + 0{ l)) 2 , as N 

M P 


00 . 


p<N 


(8.16) 


From (8. 12)— (8. 16), we conclude that (8.10) holds. 

To complete the proof of the theorem, we need to show (8.10) with the reverse 
inequality. The easiest way to do this is to note simply that the variance is a 
nonnegative quantity. Thus, 

E n co 2 > ( E n co ) 2 = (log log TV + 0( l)) 2 = (log log TV) 2 + O(loglogTV), 

where the first equality follows from Theorem 8.2. For an alternative proof, see 
Exercise 8.1. □ 

We now use Chebyshev’s inequality along with the estimates in Theorems 8.2 
and 8.3 to prove the Hardy-Ramanujan theorem. 

Proof of Theorem 8.1. From Theorems 8.2 and 8.3 we have 

Var n(q))=En co z -(ENCo) 2 — (\og\ogN) 2 -\-0(\og\ogN)— ( loglogTV + 0(1))“ = 
O(loglogTV), as TV — > oo. (8.17) 


Theorem 8.2 gives 


E N co — log log TV + Rn, where Rjy is bounded as TV -> oo. (8.18) 


Applying Chebyshev’s inequality with A = (log log TV) 2 +(5 , where 8 > 0, we obtain 
from (8.5), (8.17), and (8.18) 



co — log log TV — Rn 


> (log log TV) 2 +<5 ^ < 


0(log log TV) 
(log log TV) 1+25 ’ 


00. 


Thus, 


lim Pn(\co — loglogTV — R^\ < (log log TV) 2+5 ) = 1 

N->oo V / 


(8.19) 


Translating (8.19) back to the notation in the statement of the theorem, we have for 
every 8 > 0 


\{n € [^V] : \w(n ) -loglogTV - R N \ < (loglog^V)5 + «}| 
lim = 1. (8.20) 

N^oc N 


86 


8 Hardy-Ramanujan Theorem 


The main difference between (8.20) and the statement of the Hardy-Ramanujan 
theorem is that log log TV appears in (8.20) and log log n appears in (8.1). Because 
log log x is such a slowly varying function, this difference is not very significant. 
The remainder of the proof consists of showing that if (8.20) holds for all 8 > 0, 
then (8.1) also holds for all 8 > 0. 

Fix an arbitrary 8 > 0. Using the fact that (8.20) holds with 8 replaced by |, we 
will show that (8.1) holds for 8. This will then complete the proof of the theorem. 
The term in (8.20) may vary with TV, but it is bounded in absolute value, say 
by M . For TV 2 <n < TV, we have 

log log TV — log log n < log log TV — log log TV 2 = log 2. (8.21) 

Therefore, writing co(n) — log log n — (co(n) — log log TV — Rjy) + (log log TV — 
log log ri) + Rn, the triangle inequality and (8.21) give 

\co(n)— loglog/? | < \co(n )— loglog TV— R^|+log2+M, for TV 5 <n <N. (8.22) 
Using (8.20) with 8 replaced by |, along with (8.22) and the fact that 

1 

linTv->oo = 0, we have 

\{n e [TV] : \co(n) — log log n\ < (loglog N) ^ + ^ + log 2 + M}\ 
lim = 1. 

N->oo N 

(8.23) 

By (8.21), it follows that (log log n) i +<5 > (loglog N— log 2)i +<5 , for <n <N. 
Clearly, we have 

(log log N — log 2) 2 > (log log N) 2 " 1 " 2 8 + log 2 + M, for sufficiently large N. 
Thus, 

lie I 

(log log ri ) 2 ^ > (log log TV) 2 ^ 2 + log 2 + M, for 2 < n < N and sufficiently large TV. 

(8.24) 

1 

From (8.23), (8.24), and the fact that lim^->oo = we conclude that 

\{n e [TV] : \co(n) — \og\ogn\ < (loglog/z)5 +<5 }| 
lim = 1. 

N—xx) TV 


□ 


Exercise 8.1. Prove the lower bound 


Enoj 2 > (log log TV) 2 + O(loglogTV) 
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by using (8. 12)— (8. 15) and an inequality that begins with pipi<n 

p\¥ z P2 



P\,P2<Vn 

P\¥ Z P2 


1 

P\P2 ’ 


1 

PIP2 


> 


Exercise 8.2. Let Q(n) denote the number of prime divisors of n , counted with 
repetitions. Thus, if the prime factorization of n is given by n = Y\?=\ pf' > then 
coin) — m , but £2(n) — J2?= i k • Use the method of proof in Theorem 8.2 to prove 
that 


1 N 

= log log N + 0(1), as N 

n = 1 


OO. 


Exercise 8.3. Let d(n) denote the number of divisors of n. Thus, d( 12) = 6 
because the divisors of 12 are 1,2,3,4,6,12. Show that 


= lo 8 n + C'C 1 )- 

n z ' 

7=1 

This shows that the average order of the divisor function is the function log n . Recall 
from the remark after Theorem 8.2 that the average order of coin ), the function 
counting the number of distinct prime divisors, is the function log log n. (Hint: We 

have d(k) = J2 m \k U S0 J2k = 1 = J2ke[n\ J2m\k ^ — EmG[«] U) 


Chapter Notes 

The theorem of G. H. Hardy and S. Ramanujan was proved in 1917. The proof we 
give is along the lines of the 1934 proof of R Turan, which is much simpler than the 
original proof. For more on multiplicative number theory and primes, the subject 
of the material in Chaps. 6-8, the reader is referred to Nathanson’s book [27] and 
to the more advanced treatment of Tenenbaum in [33]. In [27] one can find a proof 
of the prime number theorem by ‘‘elementary” methods. For very accessible books 
on analytic number theory and a proof of the prime number theorem using analytic 
function theory, see, for example, Apostol’s book [5] or Jameson’s book [25]. For 
a somewhat more advanced treatment, see the book of Montgomery and Vaughan 
[26] . One can also find a proof of the prime number theorem using analytic function 
theory, as well as a whole trove of sophisticated material, in [33]. 


Chapter 9 

The Largest Clique in a Random Graph 
and Applications to Tampering Detection 
and Ramsey Theory 


9.1 Graphs and Random Graphs: Basic Definitions 

A finite graph G is a pair (V, E ), where V is a finite set of vertices and E is a 
subset of F (2) , the set of unordered pairs of elements of V. The elements of E 
are called edges. (This is what graph theorists call a simple graph. That is, there 
are no loops — edges connecting a vertex to itself — and there are no multiple edges, 
more than one edge connecting the same pair of vertices.) If x,y e V and the pair 
{x,y} e E , then we say that an edge joins the vertices x and y; otherwise, we say 
that there is no edge joining x andy.If|F| = n, then \V^ 2) \ — (”) = ^n(n — l).The 
size of the graph is the number of vertices it contains, that is, \ V\. We will identify 
the vertex set V of a graph of size n with [n\. The graph G = (V, E) with \ V\ — n 
and E = F (2) is called the complete graph of size n and is henceforth denoted 
by K n . This graph has n vertices and an edge connects every one of the \ n(n — 1) 
pairs of vertices. See Fig. 9.1. 

For a graph G = (V, E) of size n, a clique of size k e [n] is a complete subgraph 

K of G of size k\ that is, K — ( Vk , Ek), where Vk C V, \ Vk\ = k and Ek — vjp. 
See Fig. 9.2. 

Consider the vertex set V — [n\. Now construct the edge set E c [ft] (2) in the 
following random fashion. Let p e (0, 1). For each pair {x, y} e [n]^ 2 \ toss a coin 
with probability p of heads and 1 — p of tails. If heads occurs, include the pair {x, y } 
in E , and if tails occurs, do not include it in E. Do this independently for every 
pair {x, y} e [w] (2) . Denote the resulting random edge set by E n (p). The resulting 
random graph is sometimes called an Erdos-Renyi graph ; it will be denoted by 
G n (p ) = ([«], E n (p)). In this chapter, the generic notation P for probability and E 
for expectation will be used throughout. 

To get a feeling for how many edges one expects to see in the random graph, 
attach to each of the N ^ n(n — 1) potential edges a random variable which is 
equal to 1 if the edge exists in the random set of edges E n (p) and is equal to 0 
if the edge does not exist in E n (p). Denote these random variables by 
The random variables are distributed according to the Bernoulli distribution with 
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Fig. 9.1 The complete graph with 5 vertices, G = K 5 


10 



Fig. 9.2 A graph with 10 vertices and 13 edges. The largest clique is the one of size 4, formed by 
the vertices {4, 5, 6, 7} 


parameter p\ that is, P(W m — 1) = 1 — P(W m — 0) = p. Thus, the expectation 
and the variance of W m are given by EW m — p and <J 2 (W m ) — p{\— p). Let Sn — 
J] m=1 W m denote the number of edges in the random graph. By the linearity of the 
expectation, one has ES^ — Np. Because edges have been selected independently, 
the random variables {W m }% =l are independent. Thus, the variance of Sn is the sum 
of the variances of { W m }^ l=] ; that is, <j 2 (Sn) — Np(\ — p). Therefore, Chebyshev’s 
inequality gives 


P(\S N -Np\ > n'^) < 


Npi 1 ~ P ) 
N l+f 


Consequently, for any e > 0, one has lim^->oo P(\Sn ~ Np\ > N ^ ) — 0. Thus, 
for any e > 0 and large n (depending on e), with high probability the Erdos-Renyi 
graph G n (p ) will have \n 2 p + 0(n l+e ) edges. 

The main question we address in this chapter is this: how large is the largest 
complete subgraph, that is, the largest clique, in G n ( p ), as n -> 00 ? We study this 
question in Sect. 9.2. In Sect. 9.3 we apply the results of Sect. 9.2 to a problem in 
tampering detection. In Sect. 9.4, we discuss Ramsey theory for cliques in graphs 
and use random graphs to give a bound on the size of a fundamental deterministic 
quantity. 


9.2 The Size of the Largest Clique 


91 


9.2 The Size of the Largest Clique in a Random Graph 

Let L n p be the random variable denoting the size of the largest clique in G u ( p ). Let 
log^ 2) n log i logi n. 

p p p 

Theorem 9.1. Let L n _ p denote the size of the largest clique in the Erdos-Renyi 
graph G n (p). Then 



> 21ogi n — c 

p 



0, ifc < 2; 

1, ifc > 2. 


Remark. Despite the increasing randomness and disorder in G n (p) as n grows, 
the theorem shows that L n p behaves almost deterministically — with probability 
approaching 1 as n -> oo, the size of the largest clique will be very close to 
2 logi n — 21og j n. In fact, it is known that for each n , there exists a value d n 

p - P 

such that lim^oo P(L n equals either d n or d n + 1) = 1. That is, with probability 
approaching 1 as n -> oo, L n is restricted to two specific values. The proof of this 
is similar to the proof of Theorem 9.1 but a little more delicate; see [9]. We have 
chosen the formulation in Theorem 9.1 in particular because it is natural for the 
topic discussed in Sect. 9.3. 

Let N UiP (k) be the random variable denoting the number of cliques of size 
k in the random graph G n (p). We will always assume tacitly that the argu- 
ment of N 1up is a positive integer. Of course it follows from Theorem 9.1 that 

lim^oo P(N n n(k n ) — 0) = 1, if k n > 2 logi n — c\og\ n, for some c < 2. 

p p 

We say then that the random variable N n , p (k n ) converges in probability to 0 as 
n -> oo. The proof of Theorem 9.1 will actually show that if k n < 2 logi n — 

p 

c log (2) n , for some c > 2, then lim^oo P(N ipp (k n ) > M) — 1, for any M e R. 

~p 

We say then that the random variable N ipp (k n ) converges in probability to oo as 
n — ► oo. We record this as a corollary. 

Corollary 9.1. 

( 2 ) 

i. Ifk n > 2 log i n — c logy n, for some c < 2, then N n „(k n ) converges to 0 in 

p p 

probability; that is. 


lim P(N n p (k n ) = 0) = 1; 

n — >oo 
(2) 

ii. Ifk n < 2 logi n — c log \ n, for some c > 2, then N n p (k n ) converges to oo in 

p p 

probability; that is, 

lim P(N n n(k n ) > M ) = 1, for all M e R. 

77— >00 
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Proof of Theorem 9.1. The number of cliques of size k n in the complete graph K n 
is ); denote these cliques by {K } j : j — 1, . . . , Let Ijq be the indicator 
random variable defined to be equal to 1 or 0, according to whether the clique K " is 
or is not contained in the random graph G n (p). Then we can represent the random 
variable N nfP (k n ), denoting the number of cliques of size k n in the random graph 
G„(p), as 


to 

N n . p (k n ) = l K» ■ (9.1) 

7 = 1 

Let P(K u j) denote the probability that the clique K " is contained in G n (p ); that is, 
the probability that the edges of the clique K ” are all contained in the random edge 

set E n (p ) of G n (p). Since each clique K* contains (^) edges, we have 

P(K n j) = pt k i ). 


The expected value EI K n of Ik u is given by EI K n = P(K"). Thus, the expected 

J J J *1 

value of Nn tP (k n ) is given by 


to i \ 

EN n , p (k n ) = EI K"j = j P^- 

We will first prove that if c < 2, then 

lim P(L n „ > 2 log i n — c log^ n) = 0. (9.3) 

n—>oo ' p j 

We have 

EN n , p (k n ) > P(N n ,p(k n ) > 1) = P(L n p > k»), 

where the equality follows from the fact that a clique of size l contains sub-cliques 
of size j for all j € [/ — 1]. Thus, to prove (9.3) it suffices to prove that 

lim EN, up (2 logi n - c n log ( , 2) n ) = 0, (9.4) 

n — >oo p j, 

where 0 < c n < c < 2, for all n. (We have written c n instead of c in (9.4) because 
we need the argument of N n ^ p to be an integer.) This approach to proving (9.3) is 
known as th t first moment method. 
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To prove (9.4), we need the following lemma. 
Lemma 9.1. Ifk n — o(n 5), as n ^ o o, 


)k n 


n \ n 

~ — , as n ^ o o. 


'77 


k » 

A,; 7 . 


77 \ 77(77 — 1 )— (77— A: n + 1 ) 


Proof. We have (^) = 
that 


. Thus, to prove the lemma we need to show 


n(n - l)---(n + 1) 

lim 7 = 1, 

77— >00 fl^n 


or, equivalently, 


kn— 1 


77— >00 


7=1 


lim V log(l — — ) = 0. 

^ ' ft 


(9.5) 


Letting / (x) = — log(l — x), and applying Taylor’s remainder theorem in the form 
/ (x) = / (0) + / r (x*(x))x, for x > 0, where x*(x) e (0, x), we have 

0 < — log(l — x) < 2x, 0 < x < — . 

Thus, for n sufficiently large so that ^ we have 


kn 1 


7=1 


kn 1 


0<-Vl„ g (l-i)<2Vi = ^~ 1)l: 

fa FI fa F7 


7=1 


ft 


ft 


Letting ft -> oo in the above equation, and using the assumption that k n = o(n 2), 
we obtain (9.5). □ 

( 2 ) 

We can now prove (9.4). Let k n — 21ogi n — c n logy ft, where 0 < c n < c < 2, 
for all ft. Stirling’s formula gives 


p 


k n ! ~ k kn e kn yphzk n , as 


ft ^ o o. 


Using this with Lemma 9.1 and (9.2), we have 


1 k n 


, kn (kn — 1) 

n n p 2 


1 \ . n \ (kn\ ft " k„(kn- 1) 

EN„' P (k n )= | 2 ^ ~i ~ -f= 

k n ) k n \ kn e kn \f2nk 


, as ft -> oo, 


77 
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and thus 


log i EN n , p (k n ) = logi | 


1 


1 


1 


k n logi w - -fc + - k n logi k n + k n logi e - - logi 2j rk n , as n 

2 2 p p 2 p 


p 


oo. 

(9.6) 


Note that 


logi k„ — logi (2 logi n —c n log® n ) = logi ( (logi n )( 2 

P P P v P \ P 


Cn log® n 


logi n 

p 


»- 


log® n + logi (2 - 

P 


c n log i n 


- — ) = log^ n + 0(1), as n — ► oo. 

logi n p 


(9.7) 


Substituting for k n and using (9.7), we have 


1 


- ( 2 ) 
k„ logi n - -k„ - k„ logi k n = (2 logi n-c„ logy n ) logi n— 
P 2 


P 


1 


-(2 logi n - c n log? } n ) 2 - (2 logi n - c n log'f n)(. logV' w + 0(1)) = 


(2) 


,(2) 


P 


(c„ - 2) (log i «) log® « + 0( log! «). 

P p P 


(9.8) 


Since + k„ logi c—\ logi 2 nk„ — 0(log i n), it follows from (9.6), (9.8), and 


9 n | .v 77 _l 9 

P 

the fact that 0 < < c < 2 that 


(2) 

lim logi EN n n (2 logi n — c n log f n) — — oo, 

n->oo P ’ P a 


Thus, (9.4) holds, completing the proof of (9.3). 
We now prove that if c > 2, then 


lim P(L n p > 2 logi n — c log^ w) = 1 

77— >00 ’ p y 


(9.9) 


The analysis in the above paragraph shows that if c n > c > 2, for all n, then 


(2) 

lim EN n n (2 logi n — c n log , w) = oo. 

77— >00 ’ 7? o 


(9.10) 


The first moment method used above exploits the fact that (9.4) implies (9.3). 
Now (9.10) does not imply (9.9). To prove (9.9), we employ the second moment 
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method. (This method was also used in Chap. 3 and Chap. 8.) The variance of 
N n , P (k n ) is given by 


2 

Var (N n , p (k„j) = E(N n , p (k n ) - EN n . p (k „ )) = ) - (£V„ ,,(£„)) 2 . 

(9.11) 


( 2 ) 

Our goal now is to show that if k n = 2 logi n — c n log \ n with c n > c > 2, for all 

p p 

n , then 


Var (N„' P (k n )) = o((EN„' P (k„)) 2 y asn -> oo. 


(9.12) 


Chebyshev’s inequality gives for any e > 0 


P{\N n , p (k n ) - EN n , p (k n )\ > €\ EN n , p (k n )\) < 


Var (N„' P (k„)) 
s 2 ( EN„'p(k„)) : 


(9.13) 


Thus, (9.12) and (9.13) yield 


lim 

n^oo ' EN n , p (k n ) 


— 1 1 < 6) = 1, for all e > 0. 


(9.14) 


From (9.14) and (9.10), it follows that 


lim P(N n n(k n ) > M ) = 1, for all MgI. 


77->00 


(9.15) 


In particular then, (9.9) follows from (9.15). Thus, the proof of the theorem will be 
complete when we prove (9.12), or, in light of (9.11), when we prove that 

ENl p {k„) = ( EN„, p (k n )) 2 + o(iEN ntP (k n )) 2 y as n -> oo. (9.16) 

We relabel the cliques { K " : j — 1, . . . , )}, of size k n in K n according to 

the vertices that are contained in each clique. Thus, we write K f , , to denote 

the clique whose vertices are i \ , i 2 , . . . , ik n • The representation for N ntP (k n ) in (9.1) 
becomes 


Nn,p(kn) = E 7 


l</'i<L<-</T, <» 


K n 

'Ul’-jkn 


(9.17) 


Note that the random variable I K n I K n is equal to 1 if the edges of the 

* 1»*2 l kn l l’ l 2’-Jkn 

two cliques Kf ■ , and K? , , 

n n,i 2 ,---Jk n 

otherwise. Thus, 


are all contained in G„ (/?) and is equal to 0 


EI K n I K " = PiK" r r U K? , j ), 

'1 >*2 l k n l k n V H, 12,--, lkn ‘ l J2,---Jk n J 
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where P(K f 

^ 1 ? 


K n 

l 1 jL lk}i 

we have 


) is the probability that the edges of t f and 
are all contained in the random edge set E n (p) of G n (p). Consequently, 


^2 > • • • I^kr 


u K 'U. 


•Jk n 


ENl p (k n ) 


E 

\<i\<h<-<ik n < « 

l</l</2<“- </jfe„ <« 


El 


K n 

U.i'2 % 


/ x n 

K h,h,.Jk 




p (K h , u , ). 


i</i</2<-<a„ - n 
]<l\ <h<-<lk n < n 


Now by symmetry considerations, it follows that the sum 





1 2 9 • • • 


l<h<h<-<kn - n 




(9.18) 


over all ^-tuples 1 < /i < I 2 < ••• < h n — n is independent of the particular 
choice of fc /7 -tuple z’i, z’ 2 , • • • , h n • (The reader should verify this.) For convenience, 
we select the k n -tuple 1,2 , ,k n . Since there are (// ) different -tuples, we have 


ENl p (k„) = 


n 


'u 


E w.2,. 

l</l</ 2 <-</^ <77 


• ,k r 


^Kj 2 , 


■Jkt 


)• 


(9.19) 


Let 


J J(J \ , ^2 ? • • • ? ^k n ) | [^77] Cl {/l , ^ 2 ? • • • 5 ^k n }| 


denote the number of vertices shared by the cliques Af 2 ^ and ^ ^ . Each 

of these two cliques has (^ ) edges. Since the cliques share J vertices, the number 
of edges in K n xl ^ U ^ ; is equal to 2(^ 7 ) — ( 2 ), if J > 2, and is equal to 

2("),if/ = 0 or / = 1. Thus, " 


P(^" 2 k U , , k )=\ p2 \i (2) ’ lf J /(/l ’ ll ’ " ' ’ - 2: (9.20) 

1,2 '" A ” > 2 ( 2 ), if / = J(l u l 2 ,...,l kn ) < 1. 

Substituting (9.20) into (9.19), we have 


= 


ft 

k 


y] p^hd) + 


77 


l<h<h<-<lk n ^ n 

J(ll,h,..;lkn)>2 


n 

k 


E 


p 


2(*J) 


77 


i<h<h<-<h n - n 

/(/i,/2,...,/Kj)<1 


(9.21) 
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Keep in mind that our aim is to prove (9.16). We will do this by showing that 
the first term on the right hand side of (9.21) is equal to o(^(EN np (k n )) 2 ^j and 

that the second term on the right hand side of (9.21) is equal to (EN n , p (k n )) + 
o{{EN n , p (k n )) 2 ). 

In order to analyze the two terms on the right hand side of (9.21), we need to 
count the number of k n - tuples l\, I 2 , . . . , h n for which /(/ 1 , 12 , . . . , h n ) — j , for 
j = 0, 1, . . . , k n . Denote this number by #( 7 ). In order that J(l\, I 2 , . . . , h n ) — j , 
we need to choose j of the vertices of l\, I 2 , . . . , h n from the set [k n ] and the other 
k n — j vertices of l\, I 2 , . . . , h n from the set [n\ — [k n \. Thus, 



(9.22) 


We first show that the second term on the right hand side of (9.21) is equal to 
f EN lup (k n )) 2 + o(iEN ntP (k n )) 2 y Using (9.22), we have 


n 

k 


E 


p 


2 C?)= 


n 


l<h<l2<-<lk n < n 


n 

k 


HZ'- 


k n \ln-k n \ lk n 

0 /Un-0/ l 1 


n — k 


n 


k n 1 / - 


P 


2 ( k 2) = 


n 

k 


n — k n \ T / n — k n 

+ K 


n / ■- 


k 


n 


k n 1 


P 2 f") = (EN n , p (k„)) 2 ± 


(n—k„\ , y (n—k n \ 

l k n ) + K *\kn-l) 


n—kn 


<;.) 


(9.23) 


n kn 


where (9.2) was used for the final equality. By Lemma 9. , and applying 

Lemma 9.1 with n replaced by n—k n , we have ~ " = f ^-( l — ^) kn 


since k n — o(n^). Of course then also ( 7 ^ _^j) ~ (f'-iy. • Thus, 


/77-La , k (n—k n \ EL + L 

V kn ) + iU + ■ 


(n—k n 


kn 


n 


kn — 1 


« (*„-!)! 


CO 


n kn 

kn\ 


k 2 

1 + — 


n 


(9.24) 


From (9.23) and (9.24), we conclude that the second term on the right hand side 
of (9.21) is equal to ( EN n , p {k n )) 2 + o((£7V„, p (fc„)) 2 ). 

Now we consider the first term on the right hand side of (9.21). Of course, 

C-J) - <S and C") - / • Also > b y Lemma 91 > (k n ) ~ f~f- Usin § these 

estimates and (9.22), and recalling from (9.2) that (EN„ p (k„)) 2 — ) 2 /A 

we can estimate the first term on the right hand side of (9.21) by 
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ft 

k 


E 


p 


2(^)-(D = 


n 


l<h<h<-<lkn < n 

j(h,h,-Jk n )> 2 

k„ ( K\(n-k n 


" j | /c "j y k ~~ kn L 2 ^)-G) = 


,/c,i 7 ki 


(EN n , p (k n )f E ^ ( EN »<p(k„)) 2 E (fc n " ry-U") ^ 

j = 2 u„) 7=2 -y/U-UJ 


(■ EN n , p (kn)) J2 


kn - k *- J kik H l 


n 


(^« -j)'J'-n kr 


P~ (i) < ( EN n , p (k n )) 2 J2 


kn / 7.j 

kn 70-1) 


' ft 7 j ! 
7=2 J 




(9.25) 

By Stirling’s formula, j ! ~ j 7 e~ J «j2n j , as j -> oo, and thus there exists a 
constant C > 0 such that 


j\>Cj j e 7 , for all j > 2, 


(9.26) 


./— i 

It is easy to check that jp 2 is decreasing in j for j sufficiently large. Using this 

7—1 

and the fact that lim/^oo jp 2 = 0 , it follows that 


7 1 0; — 1 

min jp — — k n p 2 , for sufficiently large k n . 

2<7 "SPn 


(9.27) 


Using (9.26) for the first inequality below and (9.27) for the second inequality below, 
for sufficiently large ft the summation in the last term on the right hand side of (9.25) 
can be estimated by 


L 7 7 2 / 

#v? 


E ^V/7 _ 7'Q — 0 1 

— ft 2 < 

ft 7 j ! C . x 7 ^ n 9 

7=2 17 J= 2 jnp 2 


v y 1 v / ^ y 

v " 7 ^ j _ 2 V ft/? 2 7 


< 


1 V ( VP ekn \ j _ 1 

r Z^ V Ll / — c 1 

^ j =2 np 2 7 ^ 1 


1 Pn ! 

, if p n := - — — < 1 . 


P/7 


kn 

np 2 


(9.28) 


(2) 

Using the fact that k n — 21ogi n — c n log , n with c n > c > 2, we now show 


that 


k 


lim p n = J~pe lim — -7— = 0 . 

/7->oo n^oo np f 


(9.29) 


Using (9.7) (which of course holds for {c n }^ =l as above) for the second equality 
below, we have 
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^ jj ^ yi ^ 

logi — r- = logi k n - logi n - — logi Jd = logy n + 0(1) - logi n + 


n 


P 


np 


log^ n + (9(1) - logi n + logi n - y logy « = (1 - logy n + 0(1), 

p P P 2 n 2 


Cn (2) n Cz 


,( 2 ) 


/> 


as ft 


oo. 


Since c n > c > 2, it follows from this that lim^oologi 


kn 


= — oo and 


np 


consequently that (9.29) holds. From (9.25), (9.28), and (9.29) we conclude that the 
first term on the right hand of (9.21) is indeed o ^(EN ntP (k n )) 2 ^ . This completes the 
proof of Theorem 9.1. □ 


9.3 Detecting Tampering in a Random Graph 

The tampering detection problem we discuss is intimately related to Theorem 9.1 
and Corollary 9.1. Consider the random graph G n (p ) = ([«], E n (p)). Of course, 
E n (p) c [nY 2) is a random subset of [n] (2) . Consider now the complete graph K n 
whose edge set is [n]^ 2) . Let k n satisfy 1 < k n <n. There are ) different cliques 

of size k n in K n . We choose one of these ( ” ) cliques at random and “add” all of 
its edges to the random edge set E n ( p ) (of course some of these additional edges 
might already be in E n (/?)); that is, we take the union of E n (p) and the edges of the 
randomly chosen clique. We denote this new augmented edge set by E^ m;kn (p) and 
denote the corresponding tampered graph by G^ m;kn ( p ). See Fig. 9.3. 

The question we ask is whether one can detect the tampering asymptotically as 
n -> oo. Of course, we need to define what we mean by detecting the tampering. 
For this we need to define a distance between measures. 

Consider a finite set £2 and consider probability measures p and y on 2. We 
define the total variation distance between p and v by 

Dtv(ji, v) := max \p(A) - v(A)\. (9.30) 

AGQ 


10 



Fig. 9.3 The graph from Fig. 9.2 of size n = 10 has been tampered with by adding to it the clique 
of size k n = 3 formed by the vertices {3,6,10} 


100 


9 The Largest Clique in a Random Graph and Applications 


In Exercise 9.1, the reader is asked to show that the distance Djy(/a, v) can be 
written in two other fashions: 


Dtv(/C v) = max(/z(v4) 

ACQ 


rIA,) = 






\fi(x) — v(x)|. 


(9.31) 


It is easy to see that D TV (/x, v) takes on values in [0,1], vanishes if and only if 
p — v, and equals 1 if and only if p and v are mutually singular. We recall that 
two probability measures p and v are called mutually singular if there exists a subset 
A c £2 such that /z(v4) = v(£2—A) — 1 (and then of course p(£2 — A) — v(A) — 0). 

Consider now a £2 -valued random variable X (defined on some probability space 
(S, P )). The random variable X induces a probability measure px on £2, namely 
for any subset A c Y2, we define px(A) = P{X e A). This probability measure is 
called the distribution of X. Given two random variables X, Y taking values in Y2, 
we define the total variation distance between them by 


Dt y{X,Y) £) T v(Mx, Py)- 


We now apply the above concepts to the random graph. The original random 
graph G n (p ) has as its edge set E n (p ), whereas the tampered random graph 
G^ m ' ,k n(p) has the augmented edge set E^ m;kn (p). Each of the random variables 

E u (p) and Ej* m ' ,kn (p) takes values in the space V([n ] (2) ) := 2^ (2) , the set of all 
subsets of [nY 2) . (Given a set A, the set of all subsets of A is sometimes denoted by 
2 a \ it is known as the power set of A.) We define the tamper detection problem as 
follows. 

Definition. 

i. If 


lim D TV (E n (p),EZ a * n (pj)=0, 

n->oo X 7 

we say that the tampering is strongly undetectable. 

ii. If 


lim D TV (E n (p), E^ m;k " (p)) = 1, 

77->00 V 7 

we say that the tampering is detectable. 

iii. If 

lim inf Drv(E n (p), E^ m ’ kn (p)) > 0 and lim sup Z) T v (E n (p), (p)) < 1, 

77— >-00 77 — >00 


we say that the tampering is weakly undetectable. 
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We will prove the following theorem. 

Theorem 9.2. Consider the Erdos-Renyi graph G n (p) with random edge set 
E n ( p ) and consider the tampered graph Gf m ' ,kn (p) obtained by choosing at random 
a clique of size k n from the complete graph K n and adjoining its edges to E n (p) to 
create the augmented edge set Ef m ' ,kn (p). 

(2) 

i. Ifk n > 2 log i n — c logy n, for some c < 2, then the tampering is detectable; 

p ~p 

that is, linin-^oo DTv(E n (p), E t “ m;kn (p)) = 1. 

( 2 ) 

ii. If k n < 21ogi n — clog , n), for some c > 2, then the tampering is strongly 

p ~p 

undetectable; that is, lim^oo Djy(E n (p), Ef m ' ,kn (pj) — 0. 

Remark. In light of Corollary 9.1, Theorem 9.2 seems quite intuitive. Indeed, if 

( 2 ) 

k n > 2 log i n — clogy n , with c < 2, then N n p (k n ), the number of cliques of 

p p 

size k n in the random graph G n (p ), converges to 0 in probability. However, by 
construction, the tampered graph will always have such a clique. Thus, clearly, one 

can distinguish between the corresponding measures. On the other hand if k n < 

(2) 

2 logi n — c log V n , with c > 2, then N n P (k n ) converges to oo in probability. That 

p p 

is, for arbitrary M, the number of cliques of size k n in G n (p) will be larger than 
M with probability approaching 1 as n -> oo. Since the tampered graph Gf m ' ,kn (p) 
is obtained from the original graph G n (p) by adjoining a randomly chosen clique 
of size k n from the complete graph K n , and since the number of cliques of size 
k n in G n (p) grows unboundedly as n -> oo with probability approaching 1, it 
seems intuitive that the addition of a single randomly chosen clique would hardly 
be felt, and that asymptotically, the two graphs would be indistinguishable. Despite 
the above intuition, which leads to the correct answer in the present situation, there 
are situations in which this intuition leads one astray. See the notes at the end of 
the chapter. 

Proof For notational clarity at a certain point in the proof, we will denote N ]pp (k n ), 
the random variable denoting the number of cliques of size k n in the random graph 
G n (p ), by Nn k p \ For the proof of part (ii) of the theorem, we will need the weak 
law of large numbers for the random variable N^j p \ 


( 2 ) 

If k n < 2 logi n — c log \ n , for some c > 2, then for all € > 0, 


p 

{K) 


lim P(\ Nn ’ p 


(9.32) 


77 — >OQ 


EM 


{kn) 


- 1 | < 6 ) = 1 . 


n,p 


This result was actually proved in the course of the proof of Theorem 9. 1 — it appears 
as (9.14). 

Let pi n denote the distribution of the random variable E n ( p ) and let /x„ ;t am denote 
the distribution of the random variable Ef m;kn (p). Let { K : j = 
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denote the (") cliques of size k n in the complete graph K n . Recall that V{[n]^ 2) ) 

denotes the set of subsets of [n]^\ thus, a point co e H[n] (2r ) is a subset of 
[n]( 2 \ while a subset A c V{[n]^ 2) ) is a collection of subsets of |/z] (2) . Denote by 
A n j c T([n]( 2) ) the subset of V([nY 2) ) consisting of all those subsets of [n]^ 2) which 

contain all of the (^ ) edges of the clique Kj . Let A 11 — U )"=\A n j C V([n]^) denote 

the set of all those subsets of [n]^ 2) which possess at least one clique of size k n . 
The tampered graph is obtained by choosing at random one of the ( " ) cliques of 
size k n in K n and adding all of its edges to the original random edge set E n (p). That 
is, one of the K" , j — 1 , . . . , ) is chosen at random, and its edges are adjoined to 

E n (p) to form E^ m ' ,kn (p). Of course then, by construction, the tampered edge set 
E^ m;kn (p) must possess a clique of size k n ; thus, 

E^ m ’ ,kn (p) e A n . (9.33) 

(2) 

We first prove part (i) of the theorem. Let k n > 21ogi n — c log \ ft, for some 

p - P 

c < 2. By Corollary 9.1 (or Theorem 9.1), the probability of there being at least one 
clique of size k n in E n (p) converges to 0 as n -> oo; thus, 

lim p n (A n ) = 0. 

n—>oo 


On the other hand, by (9.33), 

Mn; tamO'U) = 1, for all n . 


Consequently, 


E>Tv(E n (p), 

| p n (A”)-p 


E^ m;k "(p)) = Dj\/(ji „ , /i„;tam ) = max 

ACV([np>) 


n 


;tam04")| = 1 - fl n (A n ) 1, US It —> OO, 


I l^n (A ) p n ;tam (zl ) | > 


proving part (i). 

We now prove part (ii). The conditional fi n -probability that a set A C V([nY 2) ) 
occurs given that the set A" c T([n]^ 2) ) occurs is denoted by ji n (A\A rl j) and is 


given by fi n (A\A n j) 


IMi {AC\A n j) 
~En (A'j ) • 


From the description of the construction of the 


tampered graph in the first paragraph of this section, along with the fact that under 
fi n the existence of any particular edge is independent of the existence of any other 
particular edges, it follows that 


AbiitamO^.) 


, (£) 

— fora c V([n] {2) ). 

\kn ' j = 1 


(9.34) 


(The reader should verify this.) 
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For a point co e nin} (2) ), we write {co} c V{[n]^ 2) ) to denote the subset of 
V([nY 2) ) consisting of the singleton co. Note that 


n A n j) 


fi n ({( 0 }), if co e A n j\ 
0, otherwise. 


Consequently, from the definition of N^'p and the definition of { A " : j — 
1, . . . , (" )}, we have 

(£) 

2>«*} n w e ^(M (2) ). (9.35) 

y=i 

Note that PniA " ) = ^( 2 ), for all j . Recall from (9.2) that EN^'p — (")/?( 2 ). 
Using these facts with (9.34) and (9.35), we have 


/ « \ / ?? \ 

1 1 n an 

/b»;tam({®}) = 777T ^3 ~ 777T ^3 77n \ 

UJ / = ! \k n ) / = 1 


MM)«!¥(<») «£>(«.) 




£JV, 


(*«) 




(9.36) 




Equation (9.36) shows that the probability measure /z„ ;tam is the probability 

measure of pt n , tilted by the random variable N^'p . 

For 6 > 0, let 


B" = {w e V{[n} {2) ) 


N^\co) 


EN, 


(k n ) 


— 1| < c}. 


n,p 


( 2 ) 

Since k n < 2 logi ft — c log \ n , for some c > 2, it follows from the law of large 

p p 

numbers in (9.32) that 


lim ji,, ( B " ) = 1 


n—>oo 


(9.37) 


From (9.36), we have 


1/VtamOB") - Bn(B")\ = I ^ /X„({m})(^U^ - l)| 


< 


ooeB'J 


EM 


n,p 




(9.38) 


we 5” 
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where the first inequality follows from the definition of B " . From (9.37) and (9.38), 
it follows that 


lim inf/i„ ;tam (fi") > 1 -e. (9.39) 

fl^O O 

Now let A c V( [n] (2) ) be arbitrary. Note that (9.38) holds also with B" replaced 
by A n B'J; so | fi nVsm (A n B'J) - /i n (A Cl B'J) \ < e. Let (B»f = P([»p) - B » 
denote the complement of 5" . Then we have 

I /L?(a) At/i;tam(a)| — 

KP n B e ") + /r„(a n (£") c ) - /r„ ;tam (a n 5") - ^ ;tam p n (5”) c )l < 

I Bn(A n B'J) - /r„ ;tam (a n £ e ")| + M„(d n (B n j) + Bn-MA n (5”) c ) < 
e + Hn{(B n J) + /r„;tam((5") c ). (9.40) 


From (9.40) and the definition of the total variation distance, it follows that 


D tv (E„(p), E^ m;k " (p)) = L>tv(Mh,/L>; tarn) = 

max \fl n (A) /In’, tam(^4)| < £ T /^«((^ e ) ) T ftn;tam((yB € ) )• 

ic?([«]( 2 )) 


(9.41) 


From (9.37), (9.39), (9.41), and the fact that € > 0 is arbitrary, we conclude that 

lim D JV (E n (p ), GO) = 0. (9.42) 

>oo v 7 

□ 

Remark. The final two paragraphs of the proof can be replaced by a shorter argu- 
ment using L 2 -convergence and the Cauchy-Schwarz inequality. See Exercise 9.2. 


9.4 Ramsey Theory 

Consider the complete graph K n . For each edge in K n , choose either blue or red, 
and color the edge with that color. We call this a 2-coloring of K n . For 2 < k < n, 
one can ask whether there exists a monochromatic clique of size k , that is, a clique 
with all of its edges blue or with all of its edges red. For k — 2, obviously there 
exists such a monochromatic clique, for all zz > 2. The fundamental theorem of 
Ramsey theory states the following: 

For each integer k > 3, there exists an integer R(k) > k such that if n > R(k), 
then every 2-coloring of K n will necessarily have a monochromatic clique of size 
k, while if k < n < R(k), then it is possible to find a 2-coloring of K n with no 
monochromatic clique of size k. 
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Fig. 9.4 The above example shows that R( 3) > 5 


Note that this result is purely deterministic — it says that no matter how we arrange 
the coloring of K n , there must be a monochromatic clique of size k, if n > R(k). 
The exact computation of the Ramsey numbers R(k) is notoriously hard. One has 
R(3) — 6 and i?(4) = 18, but the exact value of R(5) is unknown! See Fig. 9.4. 

Remark. It is known that 43 < R{5) < 49. The complete graph K 43 has | • 43 • 

42 = 903 edges. There are 2 903 different two-colorings of K 43 and ( 4 5 3 ) = 962, 598 
different cliques of size 5. 

We will prove the above fundamental result by providing upper and lower bounds 
on R(k). A nice, elementary combinatorial argument yields the following result. 

Theorem 9.3. 


R(k) < 4 k ~\ k > 3. (9.43) 

Remark. The above estimate is not far from the best known asymptotic upper bound 
for R(k). In particular, it is not known if R(k) < c k , for large k and some c < 4. 
For the best known upper bound, see [12]. 

Proof. Let k > 3. Consider an arbitrary coloring of the complete graph K A k- 1 of 
size 4 k ~ l = 2 2k ~ 2 . Define x\ — 1 and So = K A k- 1. Since x\ shares an edge with 
2 lk ~ 2 — 1 vertices, there must be a set of vertices S 1 of size at least 2? k ~ 3 such 
that every edge from ij to a vertex in 5*1 is the same color. This is the so-called 
pigeonhole principle. Let X2 denote the vertex in Si with the lowest number. By the 
same reasoning, since x 2 shares an edge with all the other vertices in Si, of which 
there are at least 2 lk ~ 3 — 1, there must be a set S2 C Si of size at least 2 2k ~ 4 such 
that every edge from X 2 to a vertex in S2 has the same color. Continuing like this, 
we obtain a sequence x\, ... , X 2 k-i of vertices and a decreasing, nested sequence of 
sets of vertices {Sj} 2k ~^ such that Xj e Sy_ 1, j e [2k — 2]. By the construction, 
it follows that for each i, the color of the edge joining to Xj is the same for all 

j > i. Now look at the 2k — 3 edges {{x/, */+i}} /=1 • Obviously, we can choose 
at least k — 1 of these edges to be all the same color. Find such a set of edges and 
denote the set of vertices in these edges by S. Note that |S| > k. Because the color 
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joining Xj to Xj is the same for all j > i, it follows in fact that the color of the edge 
joining any two vertices in S is the same. We have thus exhibited a monochromatic 
clique of size at least k. □ 

Despite the fact that the Ramsey number R(k ) is a quantity associated with a 
purely deterministic result, one can give a very short and ingenious probabilistic 
proof of a lower bound for R(k). 

Theorem 9.4. R (k) > k, for all k > 3, and 

R(k) > - (l + o(l))k2 2 , as k -> oo. (9.44) 

Remark. The best known lower bound is just \fl times the above estimate; see [2]. 
Thus, a real chasm lies between the best known upper bound and the best known 
lower bound! 

Proof. Consider a random two-coloring of the graph K n , where each edge is colored 
red or blue with equal probability, and independently of what occurs at other edges. 
Let W be a clique in K n of size k, with 3 < k < n. Let Iw be the indicator random 
variable, which is equal to 1 if W is monochromatic, and equal to 0 otherwise. 
Since there are (^) edges in W, the probability that W is all blue (or all red) is 

(^( 2 ); consequently, the probability that W is monochromatic is 2 1_ ^ 2 ). Of course, 

the expected value EIw of Iw is also equal to 2 1- ( 2 ). 

For 3 < k < n, let Xk — ^2\ w \=k Iw- The random variable counts the 
number of monochromatic cliques of size k in K n . We have 

EX k = J2 EIw = 

\W\=k 

Since the average number of monochromatic cliques of size k in this random 
two-coloring is equal to (”) 2 1- ( 2 ), there certainly must exist some particular two- 

coloring with exactly M monochromatic cliques of size k , for some M < (^)2 1_ ^) . 
Consider such a two-coloring. From each of the M monochromatic cliques of size k, 
remove one of the vertices. Let M' denote the number of vertices removed. We have 
M r < M . (It is possible that M' < M because we might have removed the same 
vertex from more than one of the cliques.) What remains is a two-coloring of the 
complete graph on n — M' vertices, and by construction, this two-coloring has no 
monochromatic cliques of size k. We conclude that 

2 l ~( 2 \ for any n > k. 




R(k) > n 
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In particular, choosing n = k + 1, one obtains R(k) > k + 1 — 2 (k + 1)2 O, and 
it is easy to check that the right hand side is greater than or equal to k, for all k > 3. 
In Exercise 9.3 the reader is asked to show that 

max in — [ |2 1_ ( 2 )) = -(l + o(l))k2^ , as k -> 00 . (9.45) 

k<n<oo \k J 

□ 

Remark. The strategy used to prove Theorem 9.4 is known as the probabilistic 
method. It was pioneered by P. Erdos. He used the method in a slightly different 
way from above and obtained a lower bound on R(k) with an extra factor of \fl in 
the denominator on the right hand side of (9.44). 

Exercise 9.1. Show that the total variation distance £) TV (/x,v) defined in (9.30) 
satisfies (9.31). 

Exercise 9.2. This exercise presents an alternative approach in place of the final 
two paragraphs of the proof of part (ii) of Theorem 9.2. Recall that the Cauchy- 

Schwarz inequality states that | Y1T=\ a i^i I — V(E; = i«, 2 )(Er=i^ 2 ), where 
{aj }™ =l , {bi }™ =l are real numbers and m is a positive integer. 

a. Use (9.36) and the Cauchy-Schwarz inequality to show that for any A C 
p([«] (2) ), one has 


I dn ;tam (A)-n„(A)\ < y/fll A) 


N 


Njy yo) 

m€A EN,^ 


E( 


2 

- 1 ) < 


N 


coeV([npy) ^ iy/ n,p 


E ( 


- 1 y n„(co). 


(9.46) 


b. The expression on the right hand side of (9.46) is called the L 2 -norm with respect 

N ^ n ^ (co) 

to the measure /i n of the function n ' p {kn) — 1 , which is defined on the domain 


V([nY 2) ). We denote this norm by 


n,p 

, N, 


(kn ) 
tup 


notation N ntP (k n ) instead of N^p is used), which holds for k n as in part (ii) 
of Theorem 9.2, to prove that 


rAT ik n ) ~ 1 2 ;/x 77 • Use (9.16) (where the 

rJSn,p 


lim 

77—^00 




(kn) 


n,p 


EN 1 


(kn) 


~ 1 




= 0. 


(9.47) 


n,p 


c. Conclude from (9.46) and (9.47) that (9.42) holds. 
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ol — (?) k 

Exercise 9.3. Show that (9.45) holds. (Hint: Let /u(x) = x — - — and 

fi,k(x) = x — - — ^ f 11 — . Show that max/ v < A < oc fi,k(x) ~ max/ { < v 

<oo 

as k -> oo. Since < (^) < |y, it then follows that max/ c < /?<00 — 

(") 2 1- ( 2 )) ~ max^< x<00 fi f k(x), as k -> oo. To obtain the asymptotic behavior 
of max^< x<00 you will need Stirling’s formula.) 

Exercise 9.4. Figure 9.4 shows that the Ramsey number R(3) satisfies R(3) > 5. 
Prove that R( 3) = 6. 


Chapter Notes 

For a wide scope of results concerning graphs, deterministic and random, see 
Bollobas’ books [9] and [10]. 

For a paper that considers tampering detection, see [29]. In particular, one 
finds there two examples that show that the intuition for Theorem 9.2, discussed 
in the remark following the theorem, can fail. It should be noted that the word 
“detection” must be understood here in a very theoretical way, as there are no known 
algorithms for detecting this clique in a reasonable amount of time, namely an 
amount of time which grows no more than polynomially in the number of vertices n . 
The construction of such algorithms is known in the theoretical computer science 
literature as the “planted clique” problem. See, for example, the paper of Alon et al. 
[3], where for p — \ it is shown that a planted clique of order n 2 can be detected in 
polynomial time. (This order for the clique is of course far, far larger than the order 
log n for the cliques discussed in this chapter.) 

The proof of the existence of the Ramsey number R(k ) goes back to F. Ramsey 
in 1930. The nice little book by Alon and Spencer [2] is devoted entirely to the 
probabilistic method in combinatorics. The book by Graham et al. [22] is devoted 
entirely to Ramsey theory. 


Chapter 10 

The Phase Transition Concerning the Giant 
Component in a Sparse Random Graph: 

A Theorem of Erdos and Renyi 


10.1 Introduction and Statement of Results 

Let G n (p n ) = ([«], E n (p n )) denote the Erdos-Renyi graph of size n which was 
introduced in Chap. 9. As in Chap. 9, the generic notation P for probability and 
E for expectation will be used in this chapter. Note that whereas in Chap. 9 the 
edge probability p was fixed independent of the graph size, in this chapter the 
edge probability p n will vary with n. A subset A C [n] of the vertex set [n] is 
called connected if for every x,y e A, there exists a path between x and y along 
edges in E n (p n ). The vertex set [n] is of course equal to the disjoint union of its 
connected components. Let C 7 j g be the random variable denoting the size of the 
largest connected component in the random graph G n (p n ). It turns out that the 
size of the largest connected component undergoes a striking phase transition as 
the edge probability passes from ( - with c < 1 to c - with c > 1. In this chapter we 
will prove the following two theorems. 

Theorem 10.1. Let p n = with c < 1. Then there exists ay = y(c ) such that the 
size C l } f of the largest connected component of G n (p n ) satisfies 

lim P(Cf 8 < y\ogn) — 1. 

77— >00 

Theorem 10.2. Let p n — with c > 1. Then there exists a unique solution = 

/3(c) £ (0, 1) to the equation 1 — e~ cx — x = 0. For any e > 0, the size Cjf of the 
largest connected component of G n (p n ) satisfies 

lim P((l— c)Pn < C} 8 < (1 + = 1. (10-1) 

77 — ^OO V 7 
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Furthermore, every other connected component of G n (p n ) is of size (9 (log ft) as 

2 n d i 

ft -> oo; that is, letting C n n ~ g denote the size of the second largest component, 
then for some y — y(c), 


lim P(Cy d4s < y log n) = 1. (10.2) 

n — ^00 

Remark 1. In light of (10.1) and (10.2), when p n — ^ with c > 1, the largest 
component is referred to as the giant component. 

Remark 2. It follows from the above theorems that when p n — ^ , for some c > 0, 
the probability that the graph is connected approaches 0 as ft -> oo. This can be 
proved directly far more easily than the above theorems can be proved. Indeed, in 
Exercise 10.3, the reader is guided through a proof of the following fact concerning 
disconnected vertices; that is vertices that are not connected to any other vertices: 
If Pn = log " +c ” , then as ft -> oo, the probability of there being at least 
one disconnected vertex approaches 0 if lim^oo c n — oo, while for any M, 
the probability of there being at least M disconnected vertices approaches 1 if 
lim^oo c n — — oo. Actually, there is a totally trivial way to see that if p n — ( ~, 
then the probability that the graph is connected does not approach 1 . Indeed, simply 
note that the probability that any particular vertex is disconnected is (1 — ^) /7_1 ; thus 
as ft -> oo, the probability that any particular vertex is disconnected converges to 
e~ c . The above theorems and this discussion naturally elicit the question, how large 
must p n be in order that the graph be connected? The answer to this was given also 
by Erdos and Renyi, who proved that the above threshold probability concerning 
whether or not the graph possesses disconnected vertices is also the threshold for 
connectivity: If p n — log/ / ? ? +c ” , then as ft -> oo, the probability of the graph being 
connected approaches 1 if lim^oo c n — oo and approaches 0 if lirn^oo c n — — oo. 
See [9]. 

Remark 3. If a connected component of a graph contains m vertices, then it must 
contain at least m — 1 edges. Thus, it follows from (10.1) that if c > 1, then for any 
e > 0 and for large ft, with high probability, the random graph G n ( ^) will contain 
at least (1 — e)j3(c)n edges. In Sect. 9.1 of Chap. 9, it was shown that for any c > 0, 
with high probability, the graph G n (p ) has \n 2 p + 0(ft 1+f ) edges. The same type 
of analysis shows that for any € > 0 and large ft, with high probability the graph 
G n ( ^) has ^cn + 0(n^ +e ) edges. Thus, one must have /3(c) <§. for 1 < c < 2. 
See Exercise 10.1. 

In Sect. 10.2 we construct the setup that will be used for the proofs of 
Theorems 10.1 and 10.2. In particular, we construct and analyze probabilistically 
an algorithm that calculates for each vertex of the graph the size of the connected 
component to which it belongs. In Sect. 10.3 we present a couple of basic large 
deviations estimates that will be needed for the proofs of the theorems. The results 
of Sects. 10.2 and 10.3 will allow for a quick proof of Theorem 10.1 in Sect. 10.4. In 
Sect. 10.5, we give a concise presentation of the Galton-Watson branching process 
and prove the most basic theorem of this subject, concerning the probability of 
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extinction. This will be used for one part of the proof of Theorem 10.2, which 
is presented in Sect. 10.6. The proof of Theorem 10.2 requires considerably more 
technical work over and above that which is required for the proof of Theorem 10.1. 


10.2 Construction of the Setup for the Proofs 
of Theorems 10.1 and 10.2 

Let x e [n\ be a vertex of the random graph. All the random quantities that we define 
below depend on x and n , but we suppress this dependence in the notation. We 
construct an algorithm that produces the connected component to which x belongs. 
We begin by calling x “alive” and calling all of the other vertices in [n\ “neutral.” 
We define Yq = 1, to indicate that at the beginning there is one vertex that is alive. 
Each of the neutral vertices y is now observed. If there is an edge connecting x to y, 
that is, if {x, y} e E n fp n ), then y is declared alive; if not, then y remains neutral. 
After every such y has been checked, we declare x to be “dead.” We define Y\ to 
be the new number of vertices that are alive. We also say that at time t — 1 there 
is one dead vertex. This ends the first step of the algorithm. We continue like this. 
If at the end of step t there are Y t > 0 vertices that are alive (and t dead vertices), 
we begin step t + 1 by selecting one of the alive vertices (it doesn’t matter which 
one) and call it z. Each of the currently neutral vertices y is now observed. If there is 
an edge connecting z and y, then y is declared alive; if not, then y remains neutral. 
After every such y has been checked, we declare z to be “dead.” We define Y t +\ to 
be the new number of vertices that are alive, and we say that at time t + 1 there are 
t + 1 dead vertices. The process stops at the end of the step T for which Yt — 0. 
It follows that at the end of step T, there are T dead vertices. A little thought shows 
that these dead vertices form the connected component to which x belongs. Thus 
T is the size of the connected component to which x belongs. (The reader should 
verify this.) See Fig. 10.1. Of course, T is a random variable since it depends on the 
random edge configuration E n (p n ). 

For 1 < t < T , define Z t to be the number of neutral vertices that are declared 
alive at step t. Then from the description of the algorithm, we have for 1 < t < T, 

Y t — Y t -i + Zj — 1. (10.3) 

Assuming that t < T, at the end of step t — 1, there are t — 1 dead vertices and 
Y t - 1 > 0 alive vertices. Thus there are n — t — Y t -\ + 1 neutral vertices. A key 
feature of the above algorithm is that no pair of vertices is ever checked twice. 
Consequently, for every pair of vertices that is checked, the probability of there 
being an edge between them is equal to p n , independently of what occurred when 
checking other pairs of vertices. Thus, since Z t counts how many of the n — t — 
Y t - 1 + 1 neutral vertices have a common edge with the alive vertex z that has been 
selected for implementing step t , and since the probability of there being an edge 
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x = 1 

y = 2 
y = 3 
y = 4 
y = 5 
y = 6 


neutral — » alive 
neutral — » neutral 
neutral — » alive 
neutral — » neutral 
neutral — > neutral 
x = 1 is declared dead 
Take alive vertex z = 2 
y — 3 : neutral — » neutral 
y = 5 : neutral — > alive 
y = 6 : neutral — » neutral 
x = 2 is declared dead 
Take alive vertex z = 4 
y — 3 : neutral — » neutral 
y = 6 : neutral — > neutral 
z = 4 is declared dead 
Take alive vertex 2 = 5 
y — 3 : neutral — » neutral 

y — 6 : neutral — » neutral 
z = 5 is declared dead 

There are no more alive vertices, so algorithm ends 

Dead sites : {1, 2, 4, 5} = the connected component containing x 



= 1 


Fig. 10.1 The algorithm 


from z to any given neutral vertex is p n , it follows that Z t is distributed according to 
the binomial distribution with parameters n — t — Y t - i + l and p n : for 1 < t < T, 

Z t ~ Bin (n - t - Y t - 1 + 1 ,p n )- (10.4) 


Of course, Y t -\ , which appears in the size parameter of the binomial distribution 
above, is itself a random variable. The meaning of (10.4) is that conditioned on 
knowing that Y t -\ — y, then Z t ~ Bin (n—t—y + 1 , p n ). Since no pair of vertices is 
ever checked twice, and since from (10.3), Y t -\ only depends on {Z s } r s z} { , it follows 
that given the value of Y t -\ , and given that T > t, the random variable Z t and the 
random variables are conditionally independent ; that is, for all m > 1 and 

all t > 2, 


P(Z t e {Z,}'- 1 ! e -17,-1 = m,T > t) = 


P(Z, e -|7-i =m,T > t)P({Z s }‘ s ~\ g -17-! = m,T > t). 


(10.5) 


As noted, (10.3) and (10.4) hold only up to time T ; however it will be convenient 
to define Y t and Z t recursively from (10.3) and (10.4) for all integers 0 < t < n. 
(Thus, e.g., if T — to, then we have Y t0 = 0 (as well as Z to = 0), and thus 
z t0+ 1 ~ Bin (n - t 0 , p„) and Y , 0+ j = Z !ll+] - 1.) In particular, for t > T , Y, 
can take on negative values. For 1 < t < T, note that the number N t of neutral 
vertices at the end of step t is given by N t — n — t — Y t . We use this equation to 
define Ao, namely, No = n — 1, indicating that there are n — 1 neutral vertices before 
the first step begins. We now use this equation to extend N t also to all 0 < t < n. 
We have the following key lemma. 
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Lemma 10.1. 


Y t — 1 + t ~ Bin(n — 1, 1 — (1 — p n )*), t > 0. 

Proof. Since N t — n — t — Y t — (n — 1) — {Y t — 1 + t), the statement of the lemma 
is equivalent to 


N t ~Bin(/i-l, (1 -/>„)'). (10.6) 

We prove (10.6) by induction. Clearly (10.6) holds for t — 0. Now assume that for 
some t > 1 , 


N t - i ~ Bin(« -1,(1- />„)' '). 


00.7) 


Using (10.3), we have 

N t = n - t - Y t = n - t - Y t - X + 1 - Z t = N t -i - Z t . (10.8) 

However, from (10.4) and the definition of N t ~ i, we have Z t ~ Bin(A^_i, /?„). 
Thus, ~ Bin(iVf_i, 1 — />„), and it follows from (10.8) that 

N t ~ Bin(A(f_ 1 , 1 — p n ). (10.9) 

By the inductive hypothesis (10.7), N t -\ is the number of heads in n — 1 independent 
coin flips, where on each flip the probability of heads is (1 — p n Y~ l • Then (10.9) 
states that N t is the number of “successes” in n — 1 independent trials, where each 
trial consists of first tossing a coin with probability (1 — p n Y~ l of heads and then 
tossing a second coin with probability 1 — p n of heads, and a “success” is defined as 
obtaining heads on both flips. This description of N t is the description of a random 
variable distributed according to Bin (n — 1, (1 — pnY)- For an alternative derivation 
that (10.9) and (10.7) imply (10.6), using generating functions, see Exercise 10.4. 

□ 


10.3 Some Basic Large Deviations Estimates 

We present two propositions which are known as large deviations estimates. The 
first proposition will be used in the proof of Theorem 10.1 and the second one will 
be used in the proof of Theorem 10.2. 

Proposition 10.1. Let c e (0, 1). For n e Z + and t > 0 with ^ < 1, let S ]U ~ 
Bin(n , ^-). Then there exists a k — k(c ) > 0, independent ofn and t, such that 

P(S n , t >t)< e~ Kt . 
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Remark. Note that ES }U — n(- f) = tc < t, since c G (0, 1). 

Proof. For any A > 0, we have 

P{S„J >t)< exp(-A t)E exp(AS„ >r ), (10.10) 

since exp(A(5 „ j — i)) > 1 on the event {S n ,t > t}. 

Since S n j is the number of successes in n independent Bernoulli trials, each 
of which has probability ^ of success, it follows that S n j can be represented as 
S n j = YTj = i Bj > where the {Bj} n - =l are independent and identically distributed 
Bernoulli random variables with parameter that is, P(Bj = 1 ) = 1 — P(Bj — 
0) = Using the fact that these random variables are independent and identically 
distributed, we have 


n n 

E exp(AS w> f) = E exp(A Bj) — ]~ [ E exp(A Bj) = ( E exp(Ai/)) 77 = 

7 = 1 7=1 

tC tC 

(1 + —e x ) n . ( 10 . 11 ) 

n n 

Since 1 + y < e y , for all y, we have (1 + ^) 77 < e x , for all x > 0 and all n > 1. 

Thus, (1 — ^ + r -^e x ) n < e tc ^ X ~ l \ and consequently, (10.10) and (10.11) give for 
any A > 0 

P{S nt > t) < = exp ( — (A — + c)^). (10.12) 

The function /(A) := A — + c satisfies /( 0) = 0, and f\ 0) > 0, since 

c G (0, 1). Thus, there exist a: = k(c) > 0 and Ao > 0 such that /( Ao) = k. We 
then conclude from (10.12) that P(S nt > t) < e~ Kt . □ 

Proposition 10.2. For each n G Z + , let S n ~ Bin(n, p), where p G (0, 1). 

K(po,p) = po log — + (1 -p 0 )log^ — — , 0 < p,p 0 < 1. 

p 1 -p 

77ie« /c(po, p) > o, if p ^ po, ond 

(i) if p < po < 1, t/zen 


P(5„ > po^) < e K (p°>p) n 9 for all n\ 


(ii) ifO<po<p, then 


-K(po,p)n 


P(s n < p 0 n) < e 


, /or o// n . 
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Remark. The function /c(po, p) is a relative entropy. For more about this, see the 
notes at the end of the chapter. 

/V 

Proof. The following three facts show that (ii) follows from (i): S n := n — S n is 

✓V. 

distributed according to the distribution Bin (n, 1 — p), P(S n < p 0 n) = P(S n > 
(1 — po)n) and a: (1 — po, l — p) — /c(po, p). So it suffices to show that (i) holds and 
that k(po, p) > 0, if p ^ po. 

Let po > p. For any A > 0, we have 

P(S n > p 0 n) < exp(— Xpon)E exp(AS n ), (10.13) 

since exp(A(5 , ;? — po«)) > 1 on the event {£„ > po^}. We can represent the random 
variable S n as S n — YTj= i where the {Bj} n j =l are independent and identically 
distributed Bernoulli random variables with parameter p; that is, P(Bj — 1) = 
1 — P(Bj — 0) = p. Using the fact that these random variables are independent 
and identically distributed, we have 

n n 

E exp(AS /7 ) = E exp(A Bj) = [ E exp(A Bj) = ( E exp(A^i)) 77 = 

7=1 7=1 

(pe x + 1 - p) n . (10.14) 

Thus, from (10.13), we obtain the inequality 

P(S n > p 0 n) < [e~ Xpo (pe x + 1 — p)]”, for all n > 1 and all A > 0. (10.15) 

The function /(A) := e~ xp0 (pe x + 1 — p), A > 0, satisfies /(0) = 1, 
lim and f\ 0) = — p 0 + p < 0. Consequently, / possesses 
a global minimum at some Ao > 0, and /( Ao) G (0,1) [indeed, /(Ao) < 0 
would contradict (10.15)]. In Exercise 10.5, the reader is asked to show that 
/(Ao) = (pz -) 1 P °(~j^) P0 - Note now that /r(po,p), defined in the statement of 
the proposition, is equal to — log/(Ao). Thus, /c(po,p) > 0, for po > p, and 
e -K(po,p) _ f(\o). Substituting A = Ao in (10.15) gives 

P(S n > Po^) < e~ K{po ’ p)n . 

Finally, since /c(po, p) = /c(l — po, 1 — p), it follows that /c(p 0 , p) > 0, if p ± p 0 . □ 


10.4 Proof of Theorem 10.1 

In this section, and also in Sect. 10.6, we will use tacitly the following facts, which 
are left to the reader in Exercise 10.6: 
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1. If Xi ~ Bin(ft/, p ), i = 1,2, and n i > ri 2 , then P{X\ > k) > P(X 2 > /c), for 
all integers k > 0. 

2. If X/ ~ Bin(ft,/>/), i — 1,2, and > p 2 , then P(Xi > k) > P(X 2 > /:), for 
all integers k > 0. 

We assume that p n — ( - with c e (0, 1). From the analysis in Sect. 10.2, we have 
seen that for x e [ft], the size of the connected component of G n (p n ) containing x 
is given by T — min {t > 0 : Y t — 0}. (As noted in Sect. 10.2, the quantities T and 
Y t depend on x and n , but this dependence is suppressed in the notation.) Let Y t be 
a random variable distributed according to the distribution Bin (n — 1,1 — (1 — ^) f ). 
Then from Lemma 10.1, 

P(T >t)< P(Y t > 0) = P(Y t >t-l) = P(Y t > t ). 

(The inequality above is not an equality because we have continued the definition 
of Y t past the time T .) Let Y t be a random variable distributed according to the 
distribution Bin (n — 1, ^). By Taylor’s remainder formula, (1 — x) ? > 1 — tx , 
for x > 0 and t a positive integer. Thus, ^ > 1 — (1 — , and consequently 

P(Y t > t) < P(Y t > t). Thus, we have 

P(T >t)< P(Y t > t ). (10.16) 

If j ~ Bin(w, ^) as in Proposition 10.1, then P(Y t > t) < P(S n t > £)• Using 
this with (10.16) and Proposition 10.1, we conclude that there exists a k > 0 such 
that 


P(r >t)< e~ Kt , t >0, n>\. (10.17) 

Let y > 0 satisfy y/r > 1. Then from (10.17) we have 

P(T > ylogn) < e~ Kylog11 — n~ Ky . (10.18) 

We have proven that the probability that the connected component containing x is 
larger than y log w is no greater than n~ Ky . There are n vertices in G n (p n ); thus the 
probability that at least one of them is in a connected component larger than y log n 
is certainly no larger than nn Ky — ft 1 Ky — > 0 as n — > 00 . This completes the 
proof of Theorem 10.1. □ 


10.5 The Galton- Watson Branching Process 

We define a random population process in discrete time. Let {q n }^L 0 be a nonneg- 
ative sequence satisfying — 1- We will refer to {q n }^ 0 as the offspring 

distribution of the process. Consider an initial particle alive at time t — 0 and set 
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Fig. 10.2 A realization of a branching process that becomes extinct at n = 5 


Xo = 1 to indicate that the size of the initial population is 1 . At time t — 1 , this 
particle gives birth to a random number of offspring and then dies. For each n e Z + , 
the probability that there were n offspring is q n . Let X\ denote the population size 
at time 1, namely the number of offspring of the initial particle. In general, at any 
time t > 1, all of the X t -\ particles alive at time t — 1 give birth to random numbers 
of offspring and die. The new number of particles is X t . The numbers of offspring 
of the different particles throughout all the generations are assumed independent 
of one another and are all distributed according to the same offspring distribution 

{^r=o- 

The random population process is called a Galton-Watson branching 

process. Clearly, if X t — 0 for some t, then X r — 0 for all r > t. If this occurs, 
we say the process becomes extinct ; otherwise we say that the process survives. 
See Fig. 10.2. If qo — 0, then the probability of survival is 1. Otherwise, there 
is a positive probability of extinction, since at any time t — 1, there is a positive 
probability (namely q 0 ?_1 ) that all of the particles die without leaving any offspring, 
in which case X t — 0. The most fundamental question we can ask about this process 
is whether it has a positive probability of surviving. 

Let IF be a random variable distributed according to the offspring distribution: 
P(W — n) — q n . Let 


oo 

(i — EW — y j n q n 

77=0 


denote the mean number of offspring of a particle. It is easy to show that EX t + \ — 
fiEX t (Exercise 10.7), from which it follows that EX t = p } , t >0. From this, it 
follows that if ji < 1, then lim^oo EX t — 0. Since EX t > P(X t > 1), it follows 
that lim^oo P(X t > 1) = 0, which means that the process has probability 1 of 
extinction. The fact that EX t is growing exponentially in t when /i > 1 would 
suggest, but not prove, that for /x > 1 the probability of extinction is less than 1 . In 
fact, we can use the method of generating functions to prove the following result. 
Define 


oo 

4> (s) = J2q n s n , S G [0, 1], 

77=0 


(10.19) 
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The function 00) is the probability generating function for the distribution {q n }ff 0 . 

Theorem 10.3. Consider a Galton-Watson branching process with offspring dis- 
tribution {q n }^L 0 , where qo > 0. Let pi — J2^=o n( In G [0, oo] denote the mean 
number of offspring of a particle. 

(i) If pi < l, then the Galton-Watson process becomes extinct with probability 1. 

(ii) If pi > 1, then the Galton-Watson process becomes extinct with probability 
a G (0, 1), where a is the unique root s G (0, 1) of the equation 00) — s. 

Proof If qo + q\ — 1, then necessarily, pi < 1. Thus, it follows from the paragraph 
before the statement of the theorem that extinction occurs with probability 1. 
Assume now that qo + qi < 1. Since the power series for 00) converges uniformly 
for s G [0, 1 — e], for any e > 0, it follows that we can differentiate term by term 
to get 


oo oo 

( p'(s ) = ^2 n ^lnS n ~ l > 0, <fi"(s) — Yn(n — 1 )q n s n ~ 2 >0, 0 < s < 1. 

n=0 n — 0 

In particular then, since qo + q\ < 1, 0 is a strictly convex function on [0, 1], and 
consequently, so is 00) 00) — s. We have 0(0) — qo >0 and 0(1) = 0. 

Also, lim^-^i- 0'0) = liniy-^i- 0'0) — 1 = pi — 1. Since 0 is strictly convex, it 
follows that if pi < 1, then ffs) < 0 for s G [0, 1), and consequently 00) > 0, 
for s G [0, 1). However, if pi > 1, then 0 r 0) > 0 for s < 1 and sufficiently 
close to 1. Using this along with the strict convexity and the fact that 0(0) > 0 
and 0(1) = 0, it follows that there exists a unique a G (0, 1) such that 0(a) = 0 
and that 00) > 0, for s g (0, a), and 00) < 0, for s g (a, 1). (The reader should 
verify this.) We have thus shown that 

the smallest root a g [0, 1] of the equation 00) = z satisfies 
a G (0, 1), if pi > 1, and a = 1, if pi < 1. Furthermore, in the case pi > 1, 
one has 00) > s, fors g [0,a), and 00) < s, fors g (a, 1). (10.20) 

Now let K t := P(X t — 0) denote the probability that extinction has occurred by 
time t. Of course, kq — 0. We claim that 


K t = 0(/q_i), fort > 1. (10.21) 

To prove this, first note that when t — 1, (10.21) says that K\ =0(0) = qo , which 
is of course true. Now consider t > 1. We first calculate P(X t = O|20 = n ), the 
probability that X t — 0, conditioned on Ij — n. By the conditioning, at time t — 1, 
there are n particles, and each of these particles will contribute independently to the 
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population size X t at time t , through t — 1 generations of branching. In order to 
have X t — 0, each of these n “new” branching processes must become extinct by 
time t — 1. The probability that any one of them becomes extinct by time t — 1 is, 
by definition, K t -\. By the independence, it follows that the probability that they all 
become extinct by time f — 1 is /c' ? _ , . We have thus proven that 

P(X t = 0\X x =«) = <_!. 


Since P{X\ = n) = q n , we conclude that 

oo oo 

K, = P(X, =0) = J2 P( x 1 = n)P(X, = 0|Xj =n) = 1 ), 

n= 0 «=() 


proving (10.21). From its definition, K t is nondecreasing, and /c ex t := lim^oo K t is 
the extinction probability. Letting £ -> oo in (10.21) gives 

^ext — 0 (^ext) • (10.22) 

It follows immediately from (10.22) and (10.20) that /c ex t = 1, if /x < 1. If /x > 1, 
then there are two roots s e [0, 1] of the equation 0(s) = s 9 namely s — a and 
^ = 1. If /c ext = 1, then K t > a for sufficiently large t, and then by (10.20) 
and (10.21), for such t, we have ic t + 1 = (j)(jc t ) < K t , which contradicts the fact 
that is nondecreasing. Thus, we conclude that /c ex t = a. □ 

At one point in the proof of Theorem 10.2, we will use the above result on the 
extinction probability of a Galton- Watson branching process. However, we will 
need to consider this process in an alternative form. In the original formulation, 
at time t, the entire population of size X t -\ that was alive at time t — 1 reproduces 
and dies, and then X t is the new population size. In other words, time t referred 
to the ti h generation of particles. In our alternative formulation, at each time, t 
only one of the particles that was alive at time t — 1 reproduces and dies. Thus, 
as before, we have Xo — 1 to denote that we start with a single particle, and X\ 
denotes the number of offspring that the original particle produces before it dies. At 
time t — 2, instead of having all X\ particles reproduce and die simultaneously, we 
choose (arbitrarily) just one of these particles that was alive at time t — 1 and have 
it reproduce and die. Then X 2 is equal to the new total population. We continue in 
this way, at each step choosing just one of the particles that was alive at the previous 
step. Since in any case, the number of offspring of any particle is independent of the 
number of offspring of the other particles, it is clear that this new process has the 
same extinction probability as the original one. 
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10.6 Proof of Theorem 10.2 

We assume that p n — c ~, with c > 1. From the analysis in Sect. 10.2, we have seen 
that for x g [n\, the size of the connected component of G n (p n ) containing x is 
given by T — min {t > 0 : Y t — 0}. 

Consider a Galton- Watson branching process {X r } r °^ 0 in the alternative form 
described at the end of Sect. 10.5, and let the offspring distribution be the Poisson 

777 

distribution with parameter c\ that is, q m — The probability generating 

function of this distribution is given by 

OO OO yyi 

<P(s) = Y q m s m = y e~ c —s m = e c(s ~ l) . (10.23) 

f J J fH J 

m= 0 m=0 

The expected number of offspring is equal to c. Since c > 1, it follows from 
Theorem 10.3 that the extinction time T QXt = inf {t > 1 : X t — 0} satisfies 


P(r ext < oo) = a, (10.24) 

where a G (0, 1) is the unique solution s G (0, 1) to the equation cj)(s ) = s, that is, 
to the equation e c{s ~ V) — s. Substituting z— 1 — s in this equation, this becomes 

1 — a is the unique root z G (0, 1) of the equation 1 — e~ c: — z — 0. (10.25) 

Let {W t } ( ^ =l be a sequence of independent, identically distributed random 
variables distributed according to the Poisson distribution with parameter c. If 
X t -\ ^ 0, then W t will serve as the number of offspring of the particle chosen 
for reproduction and death from among the X t -\ particles alive at time t — 1. Then 
we may represent the process by Xo — 1 and 

X, = X t - x + W t - 1, 1 < t < T ext . (10.26) 

If < oo, then of course X t — 0 for all t > T QXt . For any fixed t > 1, as soon as 
one knows the values of {W s Y s=l , one knows the values of {X^}^ =1 . (We note that 
it might happen that these values of {W s y s = { result in X S0 = 0 for some sq < t, 
in which case the values of +1 are superfluous for determining the values 

of {^j}j = 1 .) If r {r s Y s=l are the values obtained for {W s } f s=v let / := {^} r s= i 
denote the corresponding values for {Xy}J =1 . We write / = /(f). Note that T exi > t 
occurs if and only if Is > 0, for 0 < ^ < t — 1 , or, equivalently, if and only if 
It - 1 > 0 . 

Now consider the process introduced in Sect. 10.2. Recall that T is 

equal to the smallest t for which Y t — 0. Note from (10.3) and (10.26) that 
{Y t }J = o is defined recursively in a way very similar to the way {X t }^= 0 is defined. 
The difference is that the independent sequence of random variables {W t }fY ] 
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distributed according to the Poisson distribution with parameter c is replaced by 
the sequence {Z t } ( ^ =l . The distribution of these latter random variables is given 
by (10.4) and (10.5). (As noted in Sect. 10.2, Y u T, and Z t depend on x and n , 
but that dependence has been suppressed in the notation.) Because the form of the 
recursion formula is the same for and {Ty}' =1 , and because Xo — Yo — 1, 

it follows that if f = {r s }* =1 are the values obtained for {Z S Y S=V and if / (f) satisfies 
It- 1 > 0, then /(r) are the corresponding values for {Ti}j =1 . 

Since the random variables {W s Y s=l are independent, we have 

t 

P(.{W s y s=l = r) = n P(W S = r s ). (10.27) 

S=\ 


By (10.4) and by the conditional independence condition (10.5), if l t -\ > 0, we 
have 


p({z s y s=l = n = n p(z < = = ^-0. 


s = 1 


(10.28) 


where, for convenience, we define Zq = 1. By (10.4), the distribution of Z s , 
conditioned on Y s - \ — l s - 1 , is given by Bin (n — s — l s - \ + 1, ^). Since lim^^oo 
(n — s — l s -\ + 1 ) c - — c, it follows from the Poisson approximation to the binomial 
distribution (see Proposition A. 3 in Appendix A) that 


lim P(Z S — r s \Y s -x 

n—>oo 


= /,_0 = P(W S = r s ). 


Thus, we conclude from (10.27) and (10.28) that for any fixed t, 

lim P({Z s y s=l = r) = P({W s y s= j = f), 

77— >00 

for all f = {r y }f = 1 for which > 0, 

and, consequently, that for any fixed t , 

lim P({Ti}J =1 = Z) = P({X s Y s=l = I ), for all l = {lsY s =i satisfying l t - X > 0. 

77 — ^OO 

(10.29) 

Since T ext , the extinction time for {X ? } ? °2 0 , is the smallest t for which X t — 0, and 
T ex t > t is equivalent to l t -\ > 0, and since T is the smallest t for which Y t = 0, it 
follows from (10.29) that 


lim P(T < t) = P(T cxi < t), for any fixed t > 1. 


(10.30) 
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From (10.24), we have Hindoo P(T ext < t) — a \ thus, for any e > 0, there exists an 
integer r e such that P(T ext < t) e (a — |, a), if t > r € . It then follows from (10.30) 
that there exists an n\ t€ = n\ i€ (t) such that 

P(T < t) e (a — e,a + 6), if t > and ft > fti^(t). (10.31) 

We now analyze the probabilities P(en < T < (1— of— <?)ft) and P((l— af+6)ft < 
T < ft). We will show that these probabilities are very small. From (10.25), it 
follows that 1 — e~ cz > z, for z G (0, 1 — a), and 1 — e~ cz < z, for ze (1 — or, 1]. 
Consequently, for e > 0 , choosing 8 — 8(e) sufficiently small, we have 

1 — e~ cz — 2 8 > z, for z G (e, 1 — a — e); 1 — e~ cz + 28 < z, for z G (1 — a + 6, 1]. 

(10.32) 

Since T is the smallest t for which Y t — 0, we have 


{en <T < (1 - a - e)n} C U €n < t <(i - a - € ) n {Y t < 0}. 

(Recall that has also been defined recursively for t > T and can take on negative 

/V 

values for such t.) Thus, letting Y t be the random variable distributed according to 
the distribution Bin (n — 1, 1 — (1 — it follows from Lemma 10.1 that 

P(en < T < (1 -a -e)n) < P(Y,<t- 1). (10.33) 

€ n<t<(l— a— e)n 


One has lim /7 _ >00 (l — c - ) hu — e cb , uniformly over b in a bounded set. (The 
reader should verify this by taking the logarithm of (1 — c - ) bn and applying Taylor’s 

formula.) Applying this with b — f, with 0 < t < n, it follows that (1 — 
is small for large n , uniformly over t e [0, n\. Thus, for 8 = 8(e), which has been 
defined above, there exists an — ^ 2 , 8 (e) such that 1 — (1 — ^) r > 1 — e~^ — 8, 
for n > ri 2 ,s and 0 < t < n. Let Y t be a random variable distributed according to 
the distribution Bin (n — 1, 1 — e~ c Y — 5). Then P(Y t < t — 1) < P(Y t < t — 1), if 
n > ri 2 , 8 - Using this with (10.33), we obtain 

P(en < T < (1 -a-€)n) < P(Y, < t - l), n > n 2 , s . (10.34) 

€n<t<(l— a— e)n 

Every t in the summation on the right hand side of (10.34) is of the form t = b n n, 
with e < b n < 1— of — 6. Thus, it follows from (10.32) that 1 — e~Y — 8 — 
1 - e ~ cb » - 5 > b n + 8. We now apply part (ii) of Proposition 10.2 with n — 1 in 
place of n , with p = l—e~ cbn —8, and with p 0 — b n . Note that p and po are bounded 
from 0 and from 1 as n varies and as t varies over the above range. Also, we have 
p > Po + 8. Consequently, there exists a constant k > 0 such that /c(po, p) > k, for 
all p, po as above. Thus, we have for n > 
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P(Y, < t - 1) = P(Y, < b„n - 1) < P(Y, < b n (n - 1)) < e~ K{n - y) . (10.35) 

From (10.34) and (10.35) we conclude that 

P(en < T < (1 — a — e)n) < (1 — a)ne~ K(j2 ~ l \ n > ft 2 ,so). (10.36) 
A very similar analysis shows that 

P((l — a + e)n < T < n) < ane~ K(jl ~ l \ n > n^^e), (10.37) 

for some n^s — n 3 ,8(c) m This is left to the reader as Exercise 10.8. 

We now analyze the probability P(t < T < en), for fixed t. As in (10.33), we 
have 


Pit <T <en) < y] P(Y s <s- 1), (10.38) 

t <s <€ n 

/V 

where, we recall, Y s is distributed according to the distribution Bin (ft — 1, 1 — (1 — 
c -) s ). Let Y s be a random variable distributed according to the distribution Bin (ft — 1 , 
(1 - c -) s ). Then 


P(Y S < s - 1) = P(Y S > n — s). (10.39) 

As in the proofs of Propositions 10.1 and 10.2, we have for any A > 0 

P(Y S >n-s)< e~ Mn ~ s} Ee l?i . (10.40) 

We can represent the random variable Y s as Y s = 1 where the { Bj ^ are 

independent and identically distributed Bernoulli random variables with parameter 
(1 — ^) s ; that is, P(Bj = 1) = 1 — P(Bj = 0) = (1 — ^ ) s . Using the fact that 
these random variables are independent and identically distributed, we have 

Ee xfs = Ff Ee XB ’ = [(1 - -) s )e x + 1 - (1 - f rf _1 . (10.41) 

11 ft ft 

./ = • 

Thus, from (10.40) and (10.41), we obtain 

P(Y S > n — s) < e _;i( ' ,_s) r(l - -) s e x + 1 - (1 - -)"f _1 

ft ft 


(10.42) 


124 


10 Giant Component in a Sparse Random Graph 


We now substitute n — Ms in (10.42) to obtain 

P(Y S >n-s)< e~ Xs(M ~ l) [{\ - —) s (e x - 1) + l] M ' 5_1 < 

e -xs(M-\) r (1 _ _ i) + i] Mi = 

R/l. s 

for all A > 0. (10.43) 

We will show that for an appropriate choice of A > o, the expression in the square 
brackets above is negative and bounded away from 0 for all s > 1 and sufficiently 
large M . Let 

fs,M (A) := -A (M - 1 ) + M log ((1 - V - 1) + 1). 

Then / s ,m( 0) = 0 and 

Mil — 

/'„(A, = -<« - + (1 _ ]fc) . ( ^_, )+1 . 00.44, 

For any fixed A, defining g(j/) = , for y > 0, it is easy to check that 

g'(y) > 0; therefore, g is increasing. The last term on the right hand side of (10.44) 
is Mg(y ), with y = (1 — j—) s . Since 1 — v < e~ x , for x > 0, we have 

Q 

(1 ~ Ws) S — e ~^ i if n ~ Ms > c, and thus the last term on the right hand 

Q 

side of (10.44) is bounded from above by Mg(e ~^ ), independent of s, for s > jj. 
Thus, from (10.44), we have 


//,mW < ~(M - 1) + 


Me m e x 


1 + M 


e m (e x — 1) + 1 


= —M + 1 + 


Me 


A 


e x — 1 + e m 


1 — eM 


e x — 1 + e m 


, for all s > 


c 

M 


Since liniM^oc M 1 eM JL — —ce A , uniformly over A e [0, 1], and since c > 1, 

— 1 ~\~6 M 

it follows that there exists a Ao > 0 and an Mo such that if A e [0, Ao] and M > Mo, 
then // m (A) < for all s > 1. It then follows that / s ,m( Ao) < _ ? f or a n 

M > Mo and s > 1. Choosing A = Ao in (10.43) and using this last inequality for 
fs,M (Aq), we conclude that 


~ Aq(c— 1 ) 

P(Y S > < e 2 5 , for n > Mo.s, s > 1. 


(10.45) 
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From (10.38), (10.39), and (10.45), we obtain the estimate 


P(t < T < en) < 


Ap(c — 1) 


< 


t <s <€ n 


oo 


E 


s=t 


Ap(c — 1) 


^O(c-l) . 

e 2 1 


Aq(c-I) ’ 

1 — e 2 


ife < 


Mi 


o 


(10.46) 


Now (10.31), (10.36), (10.37), and (10.46) guarantee that for any e £ (0, 1), we can 
choose t € and n € such that for all n > ft € , one has 


a — € < P(T < t € ) < a + e; 

P(T > t € ,T ((1 — a — e)n, (1 — a + e)n )) < e; 

1 — a — 2 € < P(T £ ((1 — a — e)n, (1 — a + 6)^)) < 1 — a + 6, 
for all n >n € . 


(10.47) 


(The third set of inequalities above is a consequence of the first two sets of 
inequalities.) 

We recall that the above estimates have been obtained when p — £ , with 

1 n ’ 

C > 1, and where 1 — a — 1 — a(c) is the unique root z e (0, 1) of the equation 
1 — e~ c: — z — 0. The reader can check that the above estimates hold uniformly for 
c £ [c \ , C 2 ], for any 1 < c\ < Thus, consider as before a fixed c > 1 and 
of = af(c), and let 5 > 0 satisfy c — 8 > 1. For c' e [c — 8, c], let a' := a(c'). Then 
for all € > 0, there exists a t € > 0 and a n € > 0 such that for all n > and all 
c 7 £ [c — 5, c], one has for the graph G(n, j), 

a' — e < P(T < ^) < o' 7 + 

P(7 > t € ,T $ ((1 — a' — e)n, (1 — a/ + €)/z)) < 6; 

(10.48) 

1 — af 7 — 2e < P (T £ ((1 — of 7 — e)n , (1 — c/ + e)n)) < 1 — af 7 + €, 
for all n > n e . 


Return now to our graph G(w, £ ), with ft considerably larger than the n € 
in (10.48). (We will quantify “considerably larger” a bit later on.) Recall that we 
started out by choosing arbitrarily some vertex x in the graph G(n , £ ), and then 
applied our algorithm, obtaining T , which is the size of the connected component 
containing x. Call this the first step in a “game.” If it results in T < u, say that a 
“draw” occurred on the first step. If it results in (1 —a — e)n < T < (1 — a + c)ft, say 
that a “win” occurred on the first step. Otherwise, say that a “loss” occurred on the 
first step. If a win or a loss occurs on this first step, we stop the procedure and say 
that the game ended in a win or loss, respectively. If a draw occurs, then consider the 
remaining n — T vertices that are not in the connected component containing x, and 
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consider the corresponding edges. This gives a graph of size n' — n — T . Note that 
by the definition of the algorithm, there is no pair of points in this new graph that has 
already been checked by the algorithm. Therefore, the conditional edge probabilities 
for this new graph, conditioned on having implemented the algorithm, are as before, 
namely £ , independently for each edge. This edge probability can be written as 

/ rri 

Pn' — c ~ i s where c — ^-c. Now T < t € . Thus, if n > n € is sufficiently large, 
then c' e [c — 8, c] and n' = n — T > n € , so the estimates (10.48) (with n replaced 
by n') will hold for this new graph, which has n' vertices and edge probabilities 
p n r — ^ 7 . Choose an arbitrary vertex x\ from this new graph and repeat the above 
algorithm on the new graph. Let T\ denote the random variable T for this second 
step. If a win or a loss occurs on the second step of the game, then we stop the game 
and say that the game ended in a win or a loss, respectively. (Of course, here we 
define win, loss, and draw in terms of T\ , n ', and a' instead of T,n, and a. However, 
the same t € is used.) If a draw occurs on this second step, then we consider the 
n r — T\ — n — T — T\ vertices that are neither in the connected component of x 
nor of x\. We continue like this for a maximum of M e steps, where M e is chosen 

sufficiently large to satisfy (a(c — 8) + e) Me < 6 . (We work with € > 0 sufficiently 
small so that ct(c — 8) + € < 1.) The reason for this choice of M € will become 
clear below. If after M € steps, a win or a loss has not occurred, then we declare 
that the game has ended in a draw. Note that the smallest possible graph size that 
can ever be used in this game is n — t € (M € — 1). The smallest modified value of c 
that can ever be used is We can now quantify what we meant when 

we said at the outset of this paragraph that we are choosing n “considerably larger” 
than n € . We choose n sufficiently large so that n — t € (M € — 1) > n € and so that 
n -i,{M f -\) c > c _ ft. xhus, the estimates in (10.48) are valid for all of the steps of 
the game. 

It is easy to check that a — a(c) is decreasing for c > 1. Thus, if the game ends 
in a win, then there is a connected component of size between (1 — a(c — S) — e)n 
and (1 — a (c) + e)n. What is the probability that the game ends in a win? Let W 
denote the event that the game ends in a win, let D denote the event that it ends in a 
draw, and let L denote the event that it ends in a loss. We have 

P(W) = 1 - P(L) - P(D). (10.49) 

The game ends in a draw if there was a draw on M e consecutive steps. Since on any 
given step the probability of a draw is no greater than a(c — 8) + e, the probability 
of obtaining M e consecutive draws is no greater than 

(a (c — 8) + e ) Me ; so by the choice of M e , we have 

P(D) < ( ct(c - 5) + e) Me < e. (10.50) 

Let D c denote the complement of D ; that is, D c = W U L. Obviously, we have 
L — L D c . Then we have 


P(L) = P(L n D c ) = P(D C )P(L\D C ) < P(L\D C ). 


(10.51) 
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If one played a game with three possible outcomes on each step — win, loss, or 
draw — with respective nonzero probabilities p\ q', and r and the outcomes of all 
the steps were independent of one another, and one continued to play step after step 

until either a win or a loss occurred, then the probability of a win would be p , +q , 

and the probability of a loss would be y + , (Exercise 10.9). Conditioned on D c , 
our game essentially reduces to this game. However, the probabilities of win and 
loss and draw are not exactly fixed, but can vary a little according to (10.48). Thus, 
we can conclude that 


P(L\D C ) < 


6 

1 — a(c — 8) — 2e + 6 


6 

1 — a (c — 8) — e 


(10.52) 


From (10.49)-(10.52) we obtain 


P(W) > 1 - 6 - 


6 

1 — a(c — 8) — 6 


(10.53) 


In conclusion, we have demonstrated the following. Consider any c > 1 and any 
8 > 0 such that c — 8 > 1 . Then for each sufficiently small 6 > 0 and sufficiently 
large n depending on 6, with probability at least 1 — 6 — there will exist 

a connected component of G(ft, ^) of size between (1 — a(c — 8) — e)n and (1 — 
a(c) + e)n. If the connected component above, which has been shown to exist 
with probability close to 1 and which is of size around (1 — a )ft, is in fact with 
probability close to 1 the largest connected component, then the above estimates 
prove (10.1), since by (10.25) the /3 defined in the statement of the theorem is in 
fact 1 — a. Thus, to complete the proof of (10.1) and (10.2), it suffices to prove 
that with probability approaching 1 as n -> oo, every other component of G(n,^) 
is of size O(logft), as n -> oo. In fact, we will prove here the weaker result that 
with probability approaching 1 as n -> oo, every other component is of size o(n ) 
as n -> oo. In Exercise 10.10, the reader is guided through a proof that every other 
component is of size (9 (log ft). 

To prove that every other component is of size <9 (ft) with probability approaching 
1 as ft -> oo, assume to the contrary. Then for an unbounded sequence of ft’s, 
the following holds. As above, with probability at least 1 — 6 — , there 

will exist a connected component of G(ft, ^) of size between (1 — a(c — 8) — e)n 
and (1 — a (c) + e)n , and by our assumption, for some y > 0, with probability 
at least y, there will be another connected component of size at least yn. We may 
take y < 1 — a (c — 8) — e. But if this were true, then at the first step of our 
algorithm, when we randomly selected a vertex x, the probability that it would be 
in a connected component of size at least yn would be at least 


( 


1 - 6 - 


1 — a (c — 8) — 6 


) 


(1 — a(c — 8) 


ft 



yn 

+ y— • 

ft 
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For e and 8 sufficiently small, this number will be larger than 1 — a(c) + y, in 

2 

which case the algorithm would have to give P(T < t € ) < a(c) — y . However, for 
e > 0 sufficiently small, this contradicts the first line of (10.47). □ 

Exercise 10.1. This exercise refers to Remark 3 after Theorem 10.2. Prove that for 
any € > 0 and large n , the number of edges of G n ( is equal to \ cn + 0(n^ +€ ) 
with high probability. Show directly that /3(c) <§> for 1 < c < 2, where /3(c) is as 
in Theorem 10.2. 

Exercise 10.2. Let D n denote the number of disconnected vertices in the Erdos- 
Renyi graph G n ( p n ). For this exercise, it will be convenient to represent D n as a sum 
of indicator random variables. Let D n i be equal to 1 if the vertex i is disconnected 
and equal to 0 otherwise. Then D n — Y^l=\ A?,/- 

(a) Calculate ED n . 

(b) Calculate ED\. (Hint: Write ED\ = £(£''=, Z>„ ;)(£?•= i 

Exercise 10.3. In this exercise, you are guided through a proof of the result noted 
in Remark 2 after Theorem 10.2, namely that: 

if p n = ogn + Cn , then as n^o o, the probability that the Erdos-Renyi graph G n (p n ) 
possesses at least one disconnected vertex approaches 0 if lim^oo c n — oo, while 
for any M, the probability that it possesses at least M disconnected vertices 
approaches 1 if lim^oo c n — — oo. 

Let D n be as in Exercise 10.2, with p n — log ” +c " . 

(a) Use Exercise 10.2(a) to show that lim^oo ED n equals 0 if lim^oo c n — oo 
and equals oo if lim^^oo c n — — oo. (Hint: Consider log ED n and note that by 

Taylor’s remainder theorem, log(l — x) — —x — for 0 < x < 1, 

where x* = x*(x) satisfies 0 < x* < x.) 

(b) Use (a) to show that if lim^oo c n — oo, then P(D n = 0) = 1. 

(c) Use Exercise 10.2(b) to calculate EDf r 

(d) Show that if lim^oo c n — — oo, then the variance o 2 (D n ) satisfies a 2 (D n ) — 
o((ED n ) 2 ). (Hint: Recall that o 2 (D n ) — ED 2 2 — ( ED n ) 2 .) 

(e) Use Chebyshev’s inequality with (a) and (d) to conclude that if lim^oo c n = 
— oo, then for any M, lim^^oo P(D n > M ) = 1. 

Exercise 10.4. Recall from Chap. 5 that the probability generating function Px (s) 
of a nonnegative random variable X taking integral values is defined by 


oo 


Px(s) = Es x = p (X = 0 


i= 0 


The probability generating function of a random variable X uniquely characterizes 
its distribution, because Px . , (Q) = P(X — i). 
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(a) Let Z ~ Bin (n, p). Show that Px (s) = ( ps + 1 — p) n . 

(b) Let Z ~ Bin (n , p ), and let Y ~ Bin(Z, p '), by which is meant that conditioned 
on Z = m, the random variable Y is distributed according to Bin (m,p r ). 
Calculate Py (s) by writing 


Py(s) = Es Y = Y^ E{s Y \Z = m)P(Z = m ), 

m = 0 

and conclude that 7 ~ Bin (n, pp f ). Conclude from this that (10.7) and (10.9) 
imply (10.6). 

Exercise 10 . 5 . Let /(A) = e~ xp0 (pe x + 1 — p), with 0 < p < po < 1- Show that 
infA>o /(A) is attained at some Ao > 0 and that /( Ao) = (yz^) 1 P ° G (0, 1). 

Exercise 10 . 6 . If X ~ Bin(w, p), then Z can be represented as Z = 2?*, 

where {2?*}" =1 are independent and identically distributed random variables dis- 
tributed according to the Bernoulli distribution with parameter p; that is, P(Bj — 
1) = 1 - P(Bf = 0 ) = p. 

(a) Use the above representation to prove that 


if Xi ~ Bin (rij,p), i — 1,2, and^i >n 2 , then 
P(Zi > A) > P(X 2 > A), for all integers k > 0, 


and that 


if Z z ~ Bin(ft,p/), / = 1,2, and p\ > P 2 , then 
P(X 1 > A) > P(X 2 > A), for all integers A > 0. 


(10.54) 


(10.55) 


(Hint: For (10.54), represent Zi using the random variables and 

represent Z 2 using the first n 2 of these very same random variables. For (10.55), 
let {Ui} n i=x be independent and identically distributed random variables, dis- 
tributed according to the uniform distribution on [0,1]; that is, P(a < < 

b ) — b — a, for 0 < <2 < b < 1. Define random variables {i? 7 ^}" =1 and 
{5 ^ 2 )}” = i by the formulas 


B (i) = \ if ^ < ^ 1 ; ^(2) = \ if Ui < p 2 \ 

1 (0, if Ui>p u 1 \ 0, if Uj > p 2 . 

Now represent X\ and Z 2 through {2? ? (1) }' 7 =1 and respectively. This 

method is called coupling.) 

(b) Prove (10.54) and (10.55) directly from the fact that if Z ~ Bin (n, p ), then for 
0 < A < n, one has P(X > A) = Y^)=k (”) (1 — P) n ~ J • 

•J J 
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Exercise 10.7. If is a Galton- Watson branching process of the type 

described at the beginning of Sect. 10.5, show that EX t + \ — pEX t , where /z is 
the mean number of offspring of a particle. (Hint: Use induction and conditional 
expectation.) 

Exercise 10.8. Prove (10.37) by the method used to prove (10.36). 

Exercise 10.9. Prove that if one plays a game with three possible outcomes on each 
step — win, loss, or draw — with respective nonzero probabilities p\ q' , and r\ and 
the outcomes of all the steps are independent of one another, and one continues to 
play step after step until either a win or a loss occurs, then the probability of a win 
is , p + , and the probability of a loss is , . 

Exercise 10.10. In the proof of Theorem 10.2, after the algorithm for finding the 
connected component of a vertex was implemented a maximum of M € times, and a 
component with size around (1 — a)n was found with probability close to 1, the final 
paragraph of the proof of the theorem gave a proof that with probability approaching 
1 as n -> oo, all other components are of size o(n) as n -> oo. To prove the stronger 
result, as in the statement of Theorem 10.2, that with probability approaching 1 as 
n -> oo all other components are of size (9(log/?), consider starting the algorithm 
all over again after the component of size around (1 — a)n has been discovered. 
The number of edges left is around n' — an and the edge probability is still 
which we can write as ^ with C % ca. If C < 1, then the method of proof of 
Theorem 10.1 shows that with probability approaching 1 as n -> oo all components 
are of size (9 (log/?/) = (9 (log/?) as n -> oo. To show that C < 1, it suffices to 
show that ca < 1. To prove this, use the following facts: (1) xe~ x increases in [0, 1) 
and decreases in (1, oo), so for c > 1, there exists a unique d e (0, 1) such that 
de~ d — ce~ c \ (2) a — e c( ^ a ~ l \ 


Chapter Notes 

The context in which Theorems 10.1 and 10.2 were originally proven by Erdos 
and Renyi in 1960 [18] is a little different from the context presented here. Let 
N := (”). Define G(n, M), 0 < M < N, to be the random graph with n vertices 
and exactly M edges, where the M edges are selected uniformly at random from 
the N possible edges. One can consider an evolving random graph {G(n , By 

definition, G(n , 0) is the graph on n vertices with no edges. Then sequentially, given 
G(n,t ), for 0 < t < N — 1, one obtains the graph G(n , t + 1) by choosing at random 
from the complete graph K n one of the edges that is not in G(n,t) and adjoining 
it to G(n,t). Erdos and Renyi looked at evolving graphs of the form G(n,t n ), with 
t n = [y]. They showed that if c < 1, then with probability approaching 1 as 
n -> oo, the largest component of G(n,t n ) is of size (9(log/?), while if c > 1, 
then with probability approaching 1 as n -> oo there is one component of size 
approximately /3(c) • /?, and all other components are of size (9 (log/?). To see how 
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this connects up to the version given in this chapter, note that the expected number 
of edges in the graph G n (^) is ^(") = c{n ~ V) . A detailed study of the borderline 
case, when t n ~ | as ft -> oo, was undertaken by Bollobas [8]. Our proofs of 
Theorems 10.1 and 10.2 are along the lines of the method sketched briefly in the 
book of Alon and Spencer [2]. We are not aware in the literature of a complete 
proof of Theorems 10.1 and 10.2 with all the details. 

The large deviations bound in Proposition 10.2 is actually tight. That is, in part 
(i), where po > p, for any e > 0, one has for sufficiently large ft, P(S n > 
p 0 ft) > Thus, in particular, lim^oo f log P(S n > poft) = — /c(po, p). 

Similarly, in part (ii), where p n < p. ± log < p 0 n) = -K(p 0 ,p). 

Consider two measures, ji and /xo, defined on a finite or countably infinite set A. 
Then /i) J2 x eA Po( x ) l°g i s called the relative entropy of po with 

respect to p. It plays a fundamental role in the theory of large deviations. In the 
case that A is a two-point set, say A — {0, 1}, and /x({ 1 }) = 1 — /x({0}) = p and 
/x 0 ({l}) = 1 — /Xo ({0}) = po, one has H(/jL 0 ; p) — /c(p 0 , p). For more on large 
deviations, see the book by Dembo and Zeitouni [13]. 

For some basic results on the Galton- Watson branching process, using prob- 
abilistic methods, see the advanced probability textbook of Durrett [16]. Two 
standard texts on branching processes are the books of Harris [24] and of Athreya 
and Ney [7]. 


Appendix A 

A Quick Primer on Discrete Probability 


In this appendix, we develop some basic ideas in discrete probability theory. We 
note from the outset that some of the definitions given here are no longer correct in 
the setting of continuous probability theory. 

Let ^ be a finite or countably infinite set, and let 2 Q denote the set of subsets 
of £2 . An element A e 2^ is simply a subset of £2 , but in the language of probability 
it is called an event. A probability measure on £2 is a function P : 2 Q -> [0, 1] 
satisfying P(0) = 0, P(£2) = 1 and which is a -additive; that is, for any 1 < 
N < o o, one has P(\J„ =l A n ) = Yln = i ^04/?), whenever the events {A n }% =] 
are disjoint. From this a -additivity, it follows that P is uniquely determined by 
{P({x})} xG ^. Using the cr-additivity on disjoint events, it is not hard to prove that 
P is a -sub-additive on arbitrary events; that is, P(\J„ =l A n ) < L„= I P(A n ), for 
arbitrary events {A n }^ =l . See Exercise A.l. The pair (12 , P) is called a probability 
space. 

If C and D are events and P(C) > 0, then the conditional probability of D 
given C is denoted by P(D |C) and is defined by 


P{D\C) = 


P(c n D) 

P(C) 


Note that P(- \ C) is itself a probability measure on £2. Two events C and D are 
called independent if P(C D D) — P(C)P(D). Clearly then, C and D are 
independent if either P(C) — 0 or P(D) — 0. If P(C), P(D) > 0, it is easy 
to check that independence is equivalent to either of the following two equalities: 
P(D\C ) = P(D) or P{C\D ) = P{C). Consider a collection {C rl }^ =1 of events, 
with 1 < N < oo. This collection of events is said to be independent if for any 
finite subset {C n j }'J =l of the events, one has P(nj =1 C nj ) — \Yj = \ P(C nj )- 

Let , P) be a probability space. A function X : £2 -> R is called a (discrete, 
real-valued) random variable. For B C 1, we write {X £ i?} to denote the event 
X~ l {B) — {co e ^ : X(co) £ B }, the inverse image of B. When considering the 
probability of the event {X £ B} or the event {X — x}, we write P(X £ B) or 
P(X — x), instead of P({X £ ^}) or P({X = x}). The distribution of the random 
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variable X is the probability measure ptx on R defined by ptx(B) = P{X e B ), 
for B c R. The function px(x ) := P(X — x) is called the probability function or 
the discrete density function for X . 

The expected value or expectation EX of a random variable X is defined by 
EX = = x) — Px(x), if = x ) < °°- 

xeM xeM xeM 


Note that the set of x e R for which P{X — x) > 0 is either finite or countably 
infinite; thus, these summations are well defined. We frequently denote EX by /z. If 
P(X > 0) = 1 and the condition above in the definition of EX does not hold, then 
we write EX — oo. In the sequel, when we say that the expectation of X “exists,” 
we mean that \x \ P(X — x) < oo. 

Given a function f : R -> R and a random variable X, we can define a new 
random variable Y — f{X). One can calculate EY according to the definition of 
expectation above or in the following equivalent way: 

EY = J2f( x ) p ( x = ■*)’ if = x ) < oo- 

xeM xgI 

For n G N, the nth moment of X is defined by 

= y2 x " p ( X = *)’ if IZ \x\ n P(X = x) < oo. 

xeM xet 


If {1 — EX exists, then one defines the variance of X , denoted by a 2 or <J 2 (X) or 
Var(X), by 


a 2 = E{X — /z) 2 = ^~^(x — p) 2 P(X = x). 

xeR 


Of course, it is possible to have a 2 = oo. It is easy to check that 

a 2 (X) = EX 2 - p. (A.l) 

Chebyshev ’s inequality is a fundamental inequality involving the expected value 
and the variance. 

Proposition A.l (Chebyshev’s Inequality). Let X be a random variable with 
expectation pi and finite variance a 2 . Then for all A > 0, 
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Proof. 


P(\X-n\>X)= J2 p ( x 

xeR:|x— / x|>A 


*)< E (jL ^r ( x = x)< 

xeR:|x— /x|>A 


E ( -^P(X = x) 

xeR 



□ 

Let {Xj } n - =] be a finite collection of random variables on a probability space 
(£2 , P). We call X = (X \, . . . , X n ) a random vector. The joint probability function 
of these random variables, or equivalently, the probability function of the random 
vector, is given by 

Px 0) = Px(x i,...,x„) := P(X i = x u ...,X n = x n ) = P(X = x), 

Xi G M, i = 1 , . . . , n, where x = (x\, ... , x n ). 

It follows that 5Z/ G r w ]_{ z - } Ex-et Px(x) — P(%i — X/). For any function H : 
R n — > i?, we define 

EH(X) = ^2 H(x)p x (x), if ^2 I H(x)\p x (x) < oo. 

jcGR" x€R« 

In particular then, if EXj exists, it can be written as EXj — xj px(x). 

Similarly, if EX \ exists, for all k , then we have 

n n n 

*£<***=££ C k x k )px(x) — L C,t ( X X kPx(xj)- 

k = 1 xeR" k = 1 A=1 xeR” 

It follows from this that the expectation is linear; that is, if EX \ exists for k — 
1 , ... ,n, then 


77 77 

E^2 c kXk — ^2 CkEX k , 

k = 1 /c= 1 


for any real numbers {ck} n k=v 

Let {Xj }^ =1 be a collection of random variables on a probability space (£2 , P), 
where 1 < N < oo. The random variables are called independent if for every finite 
n < N, one has 
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P(X i = X\ , X 2 = x 2 ,...,X n = x n ) = ]~[ P(Xj = xj), 

j = i 

for all Xj gR, j — 1,2 

Let {fiYl =l be real- valued functions with f defined at least on the set {x e R : 
P(X, = x) > 0}. Assume that E\fj(Xj)\ < oo, for i — 1 From the 
definition of independence it is easy to show that if {Xj } 7 j = 1 are independent, then 


n n 

El\f i (X i ) = l\Ef i (X i ). (A. 2) 

7—1 7 — 1 

The variance is of course not linear. However the variance of a sum of independent 
random variables is equal to the sum of the variances of the random variables: 

If {XiYl =l are independent random variables, then 

n n 

<7 2 (£Xi) = I> 2 (*.-). (A. 3) 

7—1 7—1 

It suffices to prove (A. 3) for n —2 and then use induction. Let /X/ = EX\ , i — 1,2. 
We have 

a 2 (Z! + X 2 ) = E(X l +X 2 - E(X { + X 2 )f = E({X, - m) + (X 2 - fi 2 )) 2 = 
E(Xi - inf + E(X 2 - /x 2 ) 2 + 2 E(X l - in)(X 2 - ^ 4 2 ) = a 2 (X 1 ) + a 2 (X 2 ), 

where the last equality follows because (A. 2) shows that E(X\ — /x \){X 2 — /JLi) — 
E(X i - fjLi)E(X 2 - lii) = 0. 

Chebyshev’s inequality and (A. 3) allow for an exceedingly short proof of 
an important result — the weak law of large numbers for sums of independent, 
identically distributed (IID) random variables. 

Theorem A.l. Let {X n }ff l be a sequence of independent, identically distributed 
random variables and assume that their common variance a 2 is finite. Denote their 
common expectation by \i. Let S n = YTj=\ Xj- Then for any e > 0, 

lim P(\ — — fi\ >6) = 0. 

77 — ^OO n 

Proof We have ES n = npt, and since the random variables are independent and 
identically distributed, it follows from (A.3) that <J 2 (S n ) — no 2 . Now applying 
Chebyshev’s inequality to S n with A = ne gives 
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PQS n 


nfi\ > ne) < 


no 2 
(ne) 2 ’ 


which proves the theorem. □ 

Remark. The weak law of large numbers is a first moment result. It holds even 
without the finite variance assumption, but the proof is much more involved. 

The above weak law of large numbers is actually a particular case of the 
following weak law of large numbers. 

Proposition A.2. Let {Y n } ( fL l be random variables. Assume that 

<r 2 (Y n ) — o((EY n ) 2 ), as n — > oo. 


Then for any e > 0, 


lim P(| 

77— >00 


Yn 

EY n 


1 | > €) = 0 . 


Proof. By Chebyshev’s inequality, we have 



g 2 (r„) 

(e£r„) 2 ' 


□ 

If X and Y are random variables on a probability space (£2,R), and if 
P(X=x)>0, then the conditional probability function of Y given X — x is 
defined by 


Pr\x(y\x) := P(Y = y\X = x) = P{X p{ f f x) ^ ■ 
The conditional expectation of Y given X = x is defined by 

E(Y\X = x ) = Y j yP{Y = y\X = x) = p Y \x{y\x), 

jGR yeR 

if E I y\ p (Y = y\X = x) < oo. 

y€R 


It is easy to verify that 


EY = J2 E (Y \ x = x)P(X = x), 

xeR 


where E(Y\X = x)P(X = x) := 0, if P(X = x) = 0. 
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A random variable X that takes on only two values — 0 and 1, with P(X — 
1 ) — p and P{X = 0) = 1 — p, for some p e [0, 1] — is called a Bernoulli 
random variable. One writes X ~ Ber(/?). It is trivial to check that EX — p and 
a 2 (X) = p(l-p). 

Let n e N and let p g [0, 1]. A random variable X satisfying 

P(X=j)= {^p i (\-py~ i , j =0,l,...,n, 

is called a binomial random variable , and one writes X ~ Bin (ft, p). The random 
variable X can be thought of as the number of “successes” in n independent trials, 
where on each trial there are two possible outcomes — “success” and “failure” — 
and the probability of “success” is p on each trial. Letting {Z* }" =1 be independent, 
identically distributed random variables distributed according to Ber(/?), it follows 
that X can be realized as X — YH= i Z i • F rom the formula for the expected 
value and variance of a Bernoulli random variable, and from the linearity of the 
expectation and (A. 3), the above representation immediately yields EX — np and 
c j 2 (X ) = np(l — p). 

A random variable X satisfying 


P(X — n) — e — , n — 0, 1, . . . , 

n\ 

where A > 0, is called a Poisson random variable , and one writes X ~ Pois(A). 
One can check easily that EX — A and cr 2 (X) = A. 

Proposition A.3 (Poisson Approximation to the Binomial Distribution). For 

n e N and p e [0, 1], let X n p ~ Bin(n, p). For A > 0, let Xx ~ Pois( A). Then 

lim P(X thP = j ) = P(X x = j), j = 0, 1, (A.4) 

n — >oo,p — >0,np — 

j 

Proof. By assumption, we have p = fr, where lim^^oo X n — X. We have 



pj(i- p y-j= 



!)•••(« ~j + 


J 


; i 


n 


hi. y~J — 

n 



1 ) - - ■ in - j + 1 ) 

nl 


hl_y~J • 

ft 


thus, 


«- 


lim 

■oo.p — >0,np- 


a 


/ J (V 





P(X\ = j). 


□ 


A Primer on Discrete Probability 


139 


Equation (A.4) is an example of weak convergence of random variables or dis- 
tributions. In general, if {X n }^ =l are random variables with distributions {/xx„}^L l9 
and X is a random variable with distribution /Xx, then we say that X n converges 
weakly to X, or iix n converges weakly to /Xx, if lim^oo P(X n < x) = 
P(X < x ), for all i G M for which P(X — x) — 0, or equivalently, if 
lim^oo fi Xn ((— oo, x]) = /xx((— oo, x]), for all x e R for which /xx(W) = 0. 
Thus, for example, if P(X n = y) = P(X n = 1 + £) = for n = 1,2,---, 

and P(A = 0) = P(A = 1) = |, then X n converges weakly to X since 
lim^oo P{X n < x) = P(X < x), for all x £ R — {0, 1}. See also Exercise A.4. 

Exercise A.l. Use the a -additivity property of probability measures on disjoint sets 
to prove a -sub-additivity on arbitrary sets: that is, P(U„ =1 A n ) < E „=1 P(A n ), for 
arbitrary events {A n }^ =i , where 1 < N < oo. (Hint: Rewrite U^ =1 A n as a disjoint 
union U^ =| by letting B\ = A\, B 2 = A 2 - A\, B 3 = A 3 - A 2 - A\, etc.) 

Exercise A.2. Prove that P (A i U A 2 ) — P(A\) + P (A 2 ) — P(A\C\ A 2 ) , for arbitrary 
events A \ , A 2 . Then prove more generally that for any finite n and arbitrary events 
i A k} n k = v one has 

P(U n k=l A k ) = J2 P(Ai) ~ P( A i n Aj)+ 

1 </ <n 1 </ <j <n 

J2 P(Ai r\AjC\A k ) b {-\) n ~ l P{A] nA 2 ---n A n ). 

1 <i < j <k<n 


This result is known as the principle of inclusion-exclusion. 

Exercise A.3. Let (£2 , P) be a probability space and let R > 2 be an integer. For 
A c Q, recall that the complement A c of A is defined by A c — £2 — A. Prove that 
if the events {A] i } k=l are independent, then the complementary events {A c k } k=l are 
also independent. (Hint: By the definition of independence, we have 

i 

P(n l j=l Bj) = n P(Bj), f° r any £ < B and any 

j=i (A. 5) 

sub-collection {Bj}^ =l of {Ak} k=l . 

Using this, we need to prove that P(T\^ =l Bj) = fly = 1 for any sub- 

collection {BjYj =1 of {A c k } k=v Let pj — P(Bj ) and p — P(r\ l j =l Bj). Then 

we need to prove that p — n,=id — pj). Write 
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£ 

no- Pi) = 1 “ Y Pi+ Y P>Pi ’ 

y = l 1<I<€ 1 <i<j<l 

and use (A. 5) along with the principle of inclusion-exclusion, which appears in 
Exercise A. 2. 

Exercise A.4. Using (A.4), show that 

lim P(X np < x) — P(X\ < x), for all x G iU 

n—>oo,p—>0,np—>X 


Appendix B 

Power Series and Generating Functions 


We review without proof some basic results concerning power series. For more 
details, the reader should consult an advanced calculus or undergraduate analysis 
text. We also illustrate the utility of generating functions by analyzing the one that 
arises from the Fibonacci sequence. 

Let {a n }™ = q be a sequence of real numbers. Define formally the generating 
function F(t ) of {a n }^f 0 by 


oo 

F(t) = J2“nt n , 

n=0 


(B.l) 


where t G R. We say “formally” because we have made the definition before 
determining for which values of t the power series on the right hand side above 
converges. The power series converges trivially for t — 0, and it is possible that it 
converges only for t — 0, for example, if a n — n\. 


The power series YT=o a nt n converges absolutely if YT=o 
power series is uniformly, absolutely convergent for \t\ < p if 


a n t n < oo. The 


oo 


lim sup 

N ^°° I '\*Pn=N 


E 


a n t 


n 



that is, if the tail of the series YT=o I a n t n I converges to 0 uniformly over \t \ < p. 
We state four fundamental results concerning the convergence of power series: 


1 . If the power series converges for some number to 0, then necessarily the power 
series converges absolutely and uniformly for \t\ < p,for all p < to . 

2. There exists an extended real number Tq G [0, oo] such that the power series 
YT=o a nt n converges absolutely ift G [0, ro) and diverges ift > Tq. 

The number r 0 in (2) is called the radius of convergence of the power series. 
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3. The radius of convergence is given by the formula 

1 

r o = F 1 =' 

limsup^^ ”/an 

4. If the power series is uniformly, absolutely convergent for \t\ < p, then the 
function F(t ) in (B.l) is infinitely differentiable for \t\ < p, and its derivatives 
are obtained via term by term differentiation in the power series; in particular, 

F'(t ) = J2T=0 na nt"~ l - 

The generating function often provides an efficient method for obtaining infor- 
mation about the sequence {a n }^f 0 . Typically, this will occur when the generating 
function can be written in a nice closed form and analyzed. This analysis then allows 
one to obtain information about the coefficients in the generating function’s power 
series expansion, and these coefficients are of course {a n } ( ff 0 . We illustrate this in 
the case of the famous Fibonacci sequence. 

Recall that the sequence of Fibonacci numbers is defined recursively by /o = 0, 
f\ = 1 and 


fn = fn-\ + fn- 2, for « > 2. (B.2) 

The first few Fibonacci numbers are 0,1,1,2,3,5,8,13, 21,34,55,89,144. 

We will obtain a closed form for the generating function 

oo 

Fit) = f" 1 " (B ' 3) 

77—0 


of the Fibonacci numbers. Multiply both sides of (B.2) by t n and then sum both 
sides over n , with n running from 2 to oo. This gives us 

oo oo oo 

£/»*" = Y,f«w n + Y,fn-* n - 

77 —2 77—2 77 —2 

Since /o = 0 and f\ = 1, the left hand side above is equal to Fft) — t. Factoring 
out t from the first term and t 2 from the second term on the right hand side above, 
and using the fact that /o = 0, one sees that the right hand side above is equal to 
tF ft) + t 2 F(t). Thus, we obtain the equation 

F(t)-t = tF(t) + t 2 F(t), 


which gives a closed form expression for F ; namely, F(t) — yzj—p • Up until now 
we have ignored the question of convergence. However, the above formula gives us 


the answer. The roots of the polynomial t z + t — 1 are r + := 

1 C i i-w. v “T 


+ . —1 + ^5 


and r : = 


. Since |r + 1 < \r |, we conclude that the generating function Fft ) has radius 
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of convergence 
is given by 


.+ 


V^5— 1 

2 


Thus, the generating function of the Fibonacci series 



V5- 1 

< 

2 


(B.4) 


We now use the method of partial fractions to represent the function * 2 in an 
explicit power series. Using the fact that r + r - = — 1, we write 


t 2 + t — 1 = (t — r + )(t — r ) — — ( tr + 1 )(tr + + 1); 


thus, 


1 — t — t 2 (tr + \)(tr + + 1) 


(B.5) 


For unknown A and B , we write 


A 


+ 


B 


t(Ar + + Br ~ ) + (A + B) 


(tr + 1 )(tr+ + 1) tr + 1 tr+ + 1 


(tr + l)(tr+ + 1) 


(B.6) 


Comparing the left-most and right-most terms in (B.6) , we conclude that A + B — 0 
and Ar+ + Br~ — 1. Solving for A and B , we obtain A — r + l _ r - — and 

B = r -[ r + ~ Thus, from (B.5) and the first equality in (B.6), we arrive at 
the partial fraction representation 


= -(— 
2 VsM 


1 


1 — t — t 2 *y~5 y l+tr 1 +tr+ 


)• 


(B.7) 


Since 


> 


r + |, both 1+ j r _ and can be written as geometric series if 


t < r^ 

r 


1 _ 2 _ >/5-l 


1 + V5 


. We have 


1 + V5 


oo oo 

__ = £ C -i nr-yr = £( 

n = 0 n = 0 

1 00 00 i _ fiE 

77 — 0 77 — 0 


(B.8) 


) n t n . 


Thus, from (B.4), (B.7), and (B.8), we obtain 


OO 




1 /,1 + A. J-V5 


—f. V5 

77—0 


)" - (^^)") f " 


(B.9) 
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Comparing (B.3) with (B.9), we conclude that the nth Fibonacci number f n is 
given explicitly by 



1 + V5 


)" 



(B.10) 


From the explicit formula in (B.10), the asymptotic behavior of f n is clear: 


Appendix C 

A Proof of Stirling’s Formula 


Stirling’s formula states that 

n\ ~ n n e n \Jljin , as n -> oo. 


(C.l) 


In order to obtain an asymptotic formula for the discrete quantity /t !, it is extremely 
useful to be able to embed this quantity in a function of a continuous variable. 
Integrating by parts and then applying induction shows that n\ = T(n + 1), n e N, 
where the gamma function T(t) is defined by 

n OO 

T (t) — / x t ~ l e~ x dx, t > 0. 

Jo 

Thus, one proves Stirling’s formula in the following form. 

Theorem C.l (Stirling’s Formula). 

T(t + 1) ~ t f e~ f \Thzt, as t oo. (C.2) 


Proof. In the literature one can find literally dozens of proofs of Stirling’s formula. 
We present here an elementary proof that uses Laplace’s asymptotic method [14]. 
We begin by giving the intuition for the method. We write 


T(t + 



(C.3) 


where 


ft (*) = t logx — X. 
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Now \j/ t takes on its maximum at x = t , and the Taylor expansion of \// t 
starts out as 

, (x — t ) 2 ^ 

tlogt -t — — =: 


Replacing \j/ t by \j / t , we calculate that 


poo 

Jo 


e* t(x) dx = 


poo 

— / e t\ogt-t- 

J 0 


(- x~t) Z 

2t dx — 


= t'e~< f 
JO 


00 (*-0 2 
e 2t 


Making the substitution z — ^j=~ gives 


poo 

Jo 


ix—t ) 2 

e 21 dx — 


/ oo 
-ft 


e 2 dz . 


Since e 2 z2 dz — we conclude that 

poo A 

/ dx ~ t f e~ f V2nt , as t -> 00 . 

Jo 

We now turn to the rigorous proof. We can write \j/ t exactly as 


ft if + y) = t\ogt -t 

1 / 


where 


g(v) = V - log(l + v). 


Substituting this in (C.3) and making the change of variables x — y + 


r(r + l) = t'e 


00 

t ~~t / ~~tg( 7 ) 


£ 


e SKj) dy. 


Making the change of variables y = sjtz, we have 

T(t + 1) = t’ e~’ Zlnt t (t) , 


f(0 


1 f°° 

\fZ7t J-f 


e - tg( Z> dz. 


about x = t 


dx. 


t, we obtain 


(C.4) 


where 
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We will show that 


lim f (0 = 1. (C.5) 

t-^oo 

Now (C.2) follows from (C.4) and (C.5). 

Fix L > 0 and write 


f (0 = r L (t) + 




(C.6) 


where 


f /,(0 


i r L 

\/2 n J-l 


e~ ,g( ^ } dz 


and 


poo „ /» — L 

T+(t)= / e~ tgi ^ } dz, T£(t)= / e~ tg( ^ } dz. 

JL J-y/t 

From Taylor’s remainder formula it follows that for any e > 0 and sufficiently small 
v, one has 


^(1 - e)v 2 < g(v) < ^(1 + e)v 2 . 


Thus, linif-^oo fg(-4=) = ^z 2 , uniformly over z e —L. L]\ consequently, 


lim f L (t) = — = [ 

t^oc V 2 n J- 


L 


e~^ 2 dz. 


(C.7) 


—L 


Since t(g(A=))' = \ft{ I — ) = ~yl is increasing in z, we have 

^ \/t ^ 



yft + L 

yftL 


/> oo 

II t ^ 8 ^\/t^' e tg( ^ )dz = 


y/t + L 

— e v* = 

yftL 


l°g(l + T77)l 

yftL 


By Taylor’s formula, we have log(l + 



L 

S 


Yjr + 0(t 2 ) as t y oo ; thus, 
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A very similar argument gives 


limsupr L (t) < — e lLl 

L 


t — >oo 


(C.9) 


Now from (C.6)-(C.9), we obtain 


1 ( L 

2n J-l 

i r L 

\p2jz J-L 


e dz < liminf T(t) < limsup T(t) 


t->oo 


~-7 2 J ^ 

e 2 dz + 


t->oo 


lV 2 


— -L 2 

e 2 


Tt 


Since T(t) is independent of L, letting L -> oo above gives (C.5). 


□ 


Appendix D 

An Elementary Proof of ^ ^ 


The standard way to prove the identity in the title of this appendix is via Fourier 
series. We give a completely elementary proof, following [1]. Consider the double 
integral 



(D.l) 


(Actually, the expression on the right hand side of (D.l) is an improper integral, 
because the integrand blows up at (x,y) — (1,1). Thus, y^y dxdy \ — 

lim e _ >0 + fo € fo € yzyydxdy. Since the integrand is nonnegative, there is no 

problem applying the standard rules of calculus directly to / 0 ' / 0 ' y^y dxdy.) On 
the one hand, expanding the integrand in a geometric series and integrating term by 
term gives 


n l OO /» 1 /» 

XAj)" dxdy = y2 / 

77 = 0 77—0 


oo /> 1 


I = 


x n y n dxdy — 


oo n [ 


77—0 


f / x 77 dx 



y 


77 



= v — 1 — = yi 

^(n + 0 2 


(D.2) 


(The interchanging of the order of the integration and the summation is justified by 
the fact that all the summands are nonnegative.) 

On the other hand, consider the change of variables u — , v = . This 

transformation rotates the square [0, 1] x [0, 1] clockwise by 45° and shrinks its sides 
by the factor \fl. The new domain is {( u , v) : 0 < u < \,—u<v<u) U {( u , u) : 

| <u< \,u—\ < v < l—u}. The Jacobian |yyy of the transformation is equal to 

2, so the area element dxdy gets replaced by 2 dudv. The function becomes 

• Since the function and the domain are symmetric with respect to the u- axis, 
we have 
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1=4 
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dv 


1 — u 2 + V‘ 


d u H - 4 



1 —u 


dv 


l — u 2 + v : 


)d 


u. 


Using the integration formula f x2 d + a i — ~ arctan we obtain 


p 1 . 

1—4 . arctan ( 

Jo v 


7/ 


\/\ — u 2 


^ d u - 1 - 4 


1 


Vi — w 2 


/ 1 - u \ , 
arctan ( - a 


t/. 


Now the derivative of g(V) := arctan ( A—, ) is J — and the derivative of 

v Vl-H 27 Vl-r 

*(“) : = arctan (7^) = arctan (/^) is 


1 1 


2 V 1— u 2 


Thus, we conclude that 


I = 


= 4 J g(u)g' (u) du — 8 h(u)h\u) du = 2g 2 (u)\l — 4h 2 (u)\\ = 


2( arctan 2 —= — arctan 2 0) — 4( arctan 2 0 — arctan 2 — —) — 6 arctan' 


V3 


V3 


V3 


7T ^ 7T 
= « 6 » = 6 


Comparing (D.2) and (D.3) gives 


(D.3) 


OO . 

Eh 

n = I 


77 " 
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arcsine distribution, 37 
average order, 13 
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Bernoulli random variable, 138 
binomial random variable, 138 
branching process - see Galton-Watson 
branching process, 117 
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Chebyshev’s ^-function, 70 
Chebyshev’s 0-function, 68 
Chebyshev’s inequality, 134 
Chebyshev’s theorem, 68 
Chinese remainder theorem, 19 
clique, 89 

coloring of a graph, 104 
composition of an integer, 5 
cycle index, 58 
cycle type, 5 1 

D 

derangement, 49 
Dyck path, 40 

E 

Erdos-Renyi graph, 89 
Euler <p -function, 11 
Euler product formula, 19 
Ewens sampling formula, 52 
expected value, 134 
extinction, 117 


F 

Fibonacci sequence, 142 
finite graph, 89 

G 

Galton-Watson branching process, 117 
generating function, 141 
giant component, 110 

H 

Hardy-Ramanujan theorem, 81 

I 

independent events, 133 
independent random variables, 135 

L 

large deviations, 113 

M 

Mertens’ theorems, 75 
Mobius function, 8 
Mobius inversion, 10 
multiplicative function, 9 

P 

p- adic, 71 

partition of an integer, 1 
Poisson approximation to the binomial 
distribution, 138 
Poisson random variable, 138 
prime number theorem, 67 
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probability generating function, 54 
probability space, 133 

R 

Ramsey number, 105 
random variable, 133 
relative entropy, 115, 131 
restricted partition of an integer, 1 
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sieve method, 19 

simple, symmetric random walk, 35 

square-free integer, 8 

Stirling numbers of the first kind, 54 


Stirling’s formula, 145 
survival, 117 
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tampering detection, 99 
total variation distance, 99 
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variance, 134 
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weak convergence, 139 

weak law of large numbers, 136, 137 


